Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Simple file header

A few comments:

There is an audit.creation_date which has a yyyy-mm-dd type code.  I would
suggest adding a new audit.creation_time tag for the time.  I think it
would be easier for people to read than the composite format in an
audit.creation_datetime.

I would suggest that categories not be mixed in loops in the header.  While
we could straighten this out in translating to a "real" DDL2 CIF, it really
is worth the trouble to define all the necessary parent/child relationships
and pointers to keep each loop cleanly representing a single category.
This makes mapping into databases much simpler and makes some common errors
in loop structuring easier for people to find.

In DDL2 it is legal to have one data block in which tags with a single
value appear without a loop_ and another data block in which the same tags
are used in a loop_, but you might find it easier to create your software
if you always use those tags in a loop_.  This will also avoid some
problems if a DDL1.4 version to use with powder data should be needed.
More a question of style than anything else.

   -- Herbert


>Hello,
>
>Here is my attempt to define an example header describing the basic
>storage of a single image. PLEASE DON'T COUNT ON THIS PRESENTLY, IT'S
>PROBABLY WRONG. I've run into a number of issues, which I'm not sure
>about, but the following shows how I think it would be nice to be able to
>define the arrays. Here are my questions (mainly to John Westbrook):
>
>1. Does the '_audit.creation_datetime' data item exist ? Or should I
>   be using the '_audit.creation_date' item, with the time defined
>   as well as the date ?
>
>2. For a single array (image) there are a number of 'scalar' data items.
>   I'd like to define all these together, with a single 'array.id'
>   item. (Eventually in a loop_ for multiple arrays.) Is this allowed
>   as shown below ? And in a loop_ ?
>
>3. For a single array (image) there are a number of 'vector' data items,
>   i.e. one value per dimension. I'd like to define all these together,
>   in a single loop_ structure, as shown below. Is this allowed ? Or
>   would the items need to be redefined ?
>
>4. John had '_array_structure.id' as opposed to '_array_structure.array_id',
>   which I've used here. Using 'array_id' seemed more consistent to me,
>   but perhaps using 'id' had a different significance. Which should it be ?
>
>5. Since the data item 'array_intensities.undefined_value' is defined I
>   suggest that 'array_intensities.overload_value' should be defined as
>   opposed to 'array_intensities.overload'. (This inconsistence was probably
>   in my original CBF definition). Is this O.K. ?
>
>
>If we sort out these definitions, then this together with the CBF
>file structuring definitions provides the basic format (at least for the
>simplest cases i.e. Version 0.1).
>
>
>             Andy
>
>-------------------------------------------------------------------------------
>
>
>
>2.0 A SIMPLE EXAMPLE HEADER
>---------------------------
>
>Before fully describing the format we start by showing a simple, but
>important and complete usage of the format; that of storing a single
>detector image in a file together with a small amount of useful
>auxiliary information. It is intened to be a useful example for people
>who like working from examples, as opposed to full definitions. It
>should also serve as an introduction or overview of the format defintion.
>This example uses CIF DDL2 based dictionary items.
>
>The example is an image of 768 by 512 pixels stored as 16 bit unsigned
>integers, in little endian byte order. (This is the native byte ordering
>on a PC.) The pixel sizes are 100.5 by 99.5 microns. Comment lines starting
>with a hash sign (#) are used to explain the contents of the header.
>Only the ASCII part of the file is shown, but comments are used to
>describe the start of the binary section.
>
>First the file is shown with the minimum of comments that a typical
>outputting program might add. Then it is repeated, but with "over-
>commenting" to explain the format.
>
>Here is how a file might appear if listed on a PC or on a Unix system
>using 'more':
>
>
>###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0
>
>###_START_OF_HEADER
>
># Data block for image 1
>data_image_1
>
># Creation data and time
>_audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ????
>
># Sample details
>_chemical.name_common 'Protein X'
>
># Experimental details
>_diffrn_measurement.method                   Oscillation
>_diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ????
>                                                  # Needs to be defined
>
>_diffrn_radiation_wavelength.wavelength      0.7653 #   (Angstroms)
>_diffrn_source.source                        'ESRF BM-14'
>_diffrn_detector.detector                    'ESRF Be XRII/CCD'
>
># Define image storage mechanism
>
># ????? These can be looped items for multiple images, but I get the
># impression from mmCIF examples that such data items can also be
># individually assigned. Is this correct ??????
>
>_array_intensities.array_id           image_1
>_array_structure.binary_id            1     # Proposed numerical identifier
>                                            # to relate array definition to
>                                            # binary section
>_array_structure.encoding_type        unsigned_16_bit_integer
>_array_structure.compression_type     none
>_array_structure.byte_order           little_endian
>_array_intensities.linearity          linear
>_array_intensities.undefined_value    0
>_array_intensities.overload_value     65535
>
># Define dimensionality and element rastering
>loop_
>_array_structure.array_id
>_array_structure.index
>_array_structure.dimension
>_array_structure.precedence
>_array_structure.direction
>_array_element_size.size        # ???? Is this allowable. Here I'm
>                                # mixing items from different categories
>                                # inside the same loop, to avoid having
>                                # to define the indexes again. Putting
>                                # this all in one loop seems best to me.
>image_1    1      768    1    increasing    100.5e-6
>image_1    2      512    2    decreasing     99.5e-6
>
>###_END_OF_HEADER
>
>###_START_OF_BIN
>
>
>
>
>
>
>
>###_END_OF_BINARY
>
>###_END_OF_CBF
>
>
>
>Here the file header is shown again, but this time with many comment
>lines added to explain the format:
>
>
>###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0
>
># This line starting with a '#' is a CIF and CBF comment line,
># but the first line with the three '#'s is a CBF identifier.
># The text '###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION' identifiers
># the file as a CBF and must be present as the very first line of
># every CBF file. Following 'VERSION' is the version number of the
># file. A version 1.0 CIF should be readable by any program which
># fully supports the version 1.0 CBF definitions.
>
># Comment lines and white space (blanks and new lines) may appear
># anywhere outside the binary sections.
>
>###_START_OF_HEADER
>
># The '###_START_OF_HEADER' identifier defines the start of an ASCII
># header section. This where the details of the image and auxiliary
># information are defined.
>
># Data block for image 1
>data_image_1
>
># 'data_' defines the start of a CIF (and CBF) data block. We've
># chosen to call this data block 'image_1', but this was an arbitary
># choice. Within a data block a data item may only be used once.
>
># Creation Data and time
>_audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ????
>
># Sample details
>_chemical.name_common 'Protein X' # The apostrophes enclose the string
>                                  # which contains a space
>
># Experimental details
>_diffrn_measurement.method                   Oscillation
>_diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ????
>                                                  # Needs to be defined
>
>_diffrn_radiation_wavelength.wavelength      0.7653 #   (Angstroms)
>_diffrn_source.source                        'ESRF BM-14'
>_diffrn_detector.detector                    'ESRF Be XRII/CCD'
>
># Many more data items can be defined, but the above gives the idea
># of a useful minimum set (but not minimum in the sense of compulsory,
># the above data items are optional in a CIF or CBF).
>
># Define image storage mechanism
>
># ????? These can be looped items for multiple images, but I get the
># ????? impression from mmCIF examples that such data items can also be
># ????? individually asigned. Is this correct ??????
>
>_array_intensities.array_id           image_1
>_array_structure.binary_id            1     # Proposed numerical identifier
>                                            # to relate array definition to
>                                            # binary section
>_array_structure.encoding_type        unsigned_16_bit_integer
>_array_structure.compression_type     none
>_array_structure.byte_order           little_endian
>_array_intensities.linearity          linear
>_array_intensities.undefined_value    0
>_array_intensities.overload_value     65535
>
># Here the size of the image and the ordering (rastering) of the
># data elements is defined. The CIF 'loop_' structure is used to
># define different dimensions. (It can be used for defining multiple
># images.)
>loop_
>_array_structure.array_id
>_array_structure.index
>_array_structure.dimension
>_array_structure.precedence
>_array_structure.direction
>_array_element_size.size        # ???? Is this allowable. Here I'm
>                                # mixing items from different categories
>                                # inside the same loop, to avoid having
>                                # to define the indexes again. Putting
>                                # this all in one loop seems best to me.
>image_1    1      768    1    increasing    100.5e-6
>image_1    2      512    2    decreasing     99.5e-6
>
># The 'array_id' identifies data items belong to the same array. Here
># we have chosen the name 'image_1', but another name could have been
># used, so long as it's used consistently. The 'index' component refers
># to the dimension being defined, and the 'dimension' component defines
># the number of elements in that dimension. The 'precedence' component
># defines which precedence of rastering of the data. In this case the
># first dimension is the faster changing dimension. The 'direction'
># component tells us the direction in which the data rasters within a
># dimension. Here the data rasters faster from minimum elements towards
># the maximum element ('increasing') in the first dimension, and more
># slowly from the maximum element towards the minimum element in the
># second dimension. (This is the default rastering order.)
>
>
># The storage of the binary data is now fully defined.
>
># Further data items could be defined, but this header ends with the
># '###_END_OF_HEADER' identifer.
>
>###_END_OF_HEADER
>
># Here comments or white space may be added e.g. to pad out the header
># so that the start of the binary data is on a word boundary
>
># The '###_START_OF_BIN' identifier is in fact 32 bytes long and contains
># bytes to separate the "ASCII" lines from the binary data, bytes to
># try to stop the listing of the header, bytes which define the binary
># identifier which should be set to 1 to match the 'binary_id' defined
># in the header, and bytes which define the length of the binary
># section. In this case the length of the binary section is simply
># 768*512*2 = 786432 bytes (or more, if for some reason the binary
># section is made delibrately bigger than the binary data stored).
>
>###_START_OF_BIN
>
>
>
>
>
>
>
>###_END_OF_BINARY
>
># The '###_END_OF_BINARY' identifier must occur starting at the first
># byte after the number of bytes defined in the start of binary identifier.
># This may be used to check data integrity. (Following the '###_END_OF_BINARY'
># identifier the file is in "ASCII" mode again, so these comment lines
># are allowed.)
>
>
># The '###_END_OF_CBF' identifier signals the end of the CBF file.
>
>###_END_OF_CBF

=====================================================
****                BERNSTEIN + SONS
*   *       INFORMATION SYSTEMS CONSULTANTS
****     P.O. BOX 177, BELLPORT, NY 11713-0177
*   * ***
**** *            Herbert J. Bernstein
  *   ***     yaya@bernstein-plus-sons.com
 ***     *
  *   *** 1-516-286-1339    FAX: 1-516-286-1999
=====================================================



Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.