[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Simple file header
- To: imgcif-l@bnl.gov
- Subject: Simple file header
- From: Andy Hammersley <hammersl@esrf.fr>
- Date: Fri, 14 Nov 97 16:10:37 +0100
Hello, Here is my attempt to define an example header describing the basic storage of a single image. PLEASE DON'T COUNT ON THIS PRESENTLY, IT'S PROBABLY WRONG. I've run into a number of issues, which I'm not sure about, but the following shows how I think it would be nice to be able to define the arrays. Here are my questions (mainly to John Westbrook): 1. Does the '_audit.creation_datetime' data item exist ? Or should I be using the '_audit.creation_date' item, with the time defined as well as the date ? 2. For a single array (image) there are a number of 'scalar' data items. I'd like to define all these together, with a single 'array.id' item. (Eventually in a loop_ for multiple arrays.) Is this allowed as shown below ? And in a loop_ ? 3. For a single array (image) there are a number of 'vector' data items, i.e. one value per dimension. I'd like to define all these together, in a single loop_ structure, as shown below. Is this allowed ? Or would the items need to be redefined ? 4. John had '_array_structure.id' as opposed to '_array_structure.array_id', which I've used here. Using 'array_id' seemed more consistent to me, but perhaps using 'id' had a different significance. Which should it be ? 5. Since the data item 'array_intensities.undefined_value' is defined I suggest that 'array_intensities.overload_value' should be defined as opposed to 'array_intensities.overload'. (This inconsistence was probably in my original CBF definition). Is this O.K. ? If we sort out these definitions, then this together with the CBF file structuring definitions provides the basic format (at least for the simplest cases i.e. Version 0.1). Andy ------------------------------------------------------------------------------- 2.0 A SIMPLE EXAMPLE HEADER --------------------------- Before fully describing the format we start by showing a simple, but important and complete usage of the format; that of storing a single detector image in a file together with a small amount of useful auxiliary information. It is intened to be a useful example for people who like working from examples, as opposed to full definitions. It should also serve as an introduction or overview of the format defintion. This example uses CIF DDL2 based dictionary items. The example is an image of 768 by 512 pixels stored as 16 bit unsigned integers, in little endian byte order. (This is the native byte ordering on a PC.) The pixel sizes are 100.5 by 99.5 microns. Comment lines starting with a hash sign (#) are used to explain the contents of the header. Only the ASCII part of the file is shown, but comments are used to describe the start of the binary section. First the file is shown with the minimum of comments that a typical outputting program might add. Then it is repeated, but with "over- commenting" to explain the format. Here is how a file might appear if listed on a PC or on a Unix system using 'more': ###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0 ###_START_OF_HEADER # Data block for image 1 data_image_1 # Creation data and time _audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ???? # Sample details _chemical.name_common 'Protein X' # Experimental details _diffrn_measurement.method Oscillation _diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ???? # Needs to be defined _diffrn_radiation_wavelength.wavelength 0.7653 # (Angstroms) _diffrn_source.source 'ESRF BM-14' _diffrn_detector.detector 'ESRF Be XRII/CCD' # Define image storage mechanism # ????? These can be looped items for multiple images, but I get the # impression from mmCIF examples that such data items can also be # individually assigned. Is this correct ?????? _array_intensities.array_id image_1 _array_structure.binary_id 1 # Proposed numerical identifier # to relate array definition to # binary section _array_structure.encoding_type unsigned_16_bit_integer _array_structure.compression_type none _array_structure.byte_order little_endian _array_intensities.linearity linear _array_intensities.undefined_value 0 _array_intensities.overload_value 65535 # Define dimensionality and element rastering loop_ _array_structure.array_id _array_structure.index _array_structure.dimension _array_structure.precedence _array_structure.direction _array_element_size.size # ???? Is this allowable. Here I'm # mixing items from different categories # inside the same loop, to avoid having # to define the indexes again. Putting # this all in one loop seems best to me. image_1 1 768 1 increasing 100.5e-6 image_1 2 512 2 decreasing 99.5e-6 ###_END_OF_HEADER ###_START_OF_BIN ###_END_OF_BINARY ###_END_OF_CBF Here the file header is shown again, but this time with many comment lines added to explain the format: ###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0 # This line starting with a '#' is a CIF and CBF comment line, # but the first line with the three '#'s is a CBF identifier. # The text '###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION' identifiers # the file as a CBF and must be present as the very first line of # every CBF file. Following 'VERSION' is the version number of the # file. A version 1.0 CIF should be readable by any program which # fully supports the version 1.0 CBF definitions. # Comment lines and white space (blanks and new lines) may appear # anywhere outside the binary sections. ###_START_OF_HEADER # The '###_START_OF_HEADER' identifier defines the start of an ASCII # header section. This where the details of the image and auxiliary # information are defined. # Data block for image 1 data_image_1 # 'data_' defines the start of a CIF (and CBF) data block. We've # chosen to call this data block 'image_1', but this was an arbitary # choice. Within a data block a data item may only be used once. # Creation Data and time _audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ???? # Sample details _chemical.name_common 'Protein X' # The apostrophes enclose the string # which contains a space # Experimental details _diffrn_measurement.method Oscillation _diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ???? # Needs to be defined _diffrn_radiation_wavelength.wavelength 0.7653 # (Angstroms) _diffrn_source.source 'ESRF BM-14' _diffrn_detector.detector 'ESRF Be XRII/CCD' # Many more data items can be defined, but the above gives the idea # of a useful minimum set (but not minimum in the sense of compulsory, # the above data items are optional in a CIF or CBF). # Define image storage mechanism # ????? These can be looped items for multiple images, but I get the # ????? impression from mmCIF examples that such data items can also be # ????? individually asigned. Is this correct ?????? _array_intensities.array_id image_1 _array_structure.binary_id 1 # Proposed numerical identifier # to relate array definition to # binary section _array_structure.encoding_type unsigned_16_bit_integer _array_structure.compression_type none _array_structure.byte_order little_endian _array_intensities.linearity linear _array_intensities.undefined_value 0 _array_intensities.overload_value 65535 # Here the size of the image and the ordering (rastering) of the # data elements is defined. The CIF 'loop_' structure is used to # define different dimensions. (It can be used for defining multiple # images.) loop_ _array_structure.array_id _array_structure.index _array_structure.dimension _array_structure.precedence _array_structure.direction _array_element_size.size # ???? Is this allowable. Here I'm # mixing items from different categories # inside the same loop, to avoid having # to define the indexes again. Putting # this all in one loop seems best to me. image_1 1 768 1 increasing 100.5e-6 image_1 2 512 2 decreasing 99.5e-6 # The 'array_id' identifies data items belong to the same array. Here # we have chosen the name 'image_1', but another name could have been # used, so long as it's used consistently. The 'index' component refers # to the dimension being defined, and the 'dimension' component defines # the number of elements in that dimension. The 'precedence' component # defines which precedence of rastering of the data. In this case the # first dimension is the faster changing dimension. The 'direction' # component tells us the direction in which the data rasters within a # dimension. Here the data rasters faster from minimum elements towards # the maximum element ('increasing') in the first dimension, and more # slowly from the maximum element towards the minimum element in the # second dimension. (This is the default rastering order.) # The storage of the binary data is now fully defined. # Further data items could be defined, but this header ends with the # '###_END_OF_HEADER' identifer. ###_END_OF_HEADER # Here comments or white space may be added e.g. to pad out the header # so that the start of the binary data is on a word boundary # The '###_START_OF_BIN' identifier is in fact 32 bytes long and contains # bytes to separate the "ASCII" lines from the binary data, bytes to # try to stop the listing of the header, bytes which define the binary # identifier which should be set to 1 to match the 'binary_id' defined # in the header, and bytes which define the length of the binary # section. In this case the length of the binary section is simply # 768*512*2 = 786432 bytes (or more, if for some reason the binary # section is made delibrately bigger than the binary data stored). ###_START_OF_BIN ###_END_OF_BINARY # The '###_END_OF_BINARY' identifier must occur starting at the first # byte after the number of bytes defined in the start of binary identifier. # This may be used to check data integrity. (Following the '###_END_OF_BINARY' # identifier the file is in "ASCII" mode again, so these comment lines # are allowed.) # The '###_END_OF_CBF' identifier signals the end of the CBF file. ###_END_OF_CBF
Reply to: [list | sender only]
- Prev by Date: Re: CBF file structuring
- Next by Date: RE: Simple file header
- Prev by thread: Re: Slight change
- Next by thread: RE: Simple file header
- Index(es):