Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Simple file header


Hello,

Here is my attempt to define an example header describing the basic
storage of a single image. PLEASE DON'T COUNT ON THIS PRESENTLY, IT'S
PROBABLY WRONG. I've run into a number of issues, which I'm not sure 
about, but the following shows how I think it would be nice to be able to
define the arrays. Here are my questions (mainly to John Westbrook):

1. Does the '_audit.creation_datetime' data item exist ? Or should I
   be using the '_audit.creation_date' item, with the time defined
   as well as the date ? 

2. For a single array (image) there are a number of 'scalar' data items.
   I'd like to define all these together, with a single 'array.id'
   item. (Eventually in a loop_ for multiple arrays.) Is this allowed
   as shown below ? And in a loop_ ?

3. For a single array (image) there are a number of 'vector' data items,
   i.e. one value per dimension. I'd like to define all these together,
   in a single loop_ structure, as shown below. Is this allowed ? Or
   would the items need to be redefined ?

4. John had '_array_structure.id' as opposed to '_array_structure.array_id', 
   which I've used here. Using 'array_id' seemed more consistent to me, 
   but perhaps using 'id' had a different significance. Which should it be ?

5. Since the data item 'array_intensities.undefined_value' is defined I
   suggest that 'array_intensities.overload_value' should be defined as
   opposed to 'array_intensities.overload'. (This inconsistence was probably
   in my original CBF definition). Is this O.K. ?


If we sort out these definitions, then this together with the CBF
file structuring definitions provides the basic format (at least for the
simplest cases i.e. Version 0.1).


             Andy

-------------------------------------------------------------------------------



2.0 A SIMPLE EXAMPLE HEADER
---------------------------

Before fully describing the format we start by showing a simple, but
important and complete usage of the format; that of storing a single
detector image in a file together with a small amount of useful
auxiliary information. It is intened to be a useful example for people
who like working from examples, as opposed to full definitions. It
should also serve as an introduction or overview of the format defintion.
This example uses CIF DDL2 based dictionary items.

The example is an image of 768 by 512 pixels stored as 16 bit unsigned
integers, in little endian byte order. (This is the native byte ordering
on a PC.) The pixel sizes are 100.5 by 99.5 microns. Comment lines starting 
with a hash sign (#) are used to explain the contents of the header. 
Only the ASCII part of the file is shown, but comments are used to 
describe the start of the binary section. 

First the file is shown with the minimum of comments that a typical
outputting program might add. Then it is repeated, but with "over-
commenting" to explain the format.

Here is how a file might appear if listed on a PC or on a Unix system 
using 'more':


###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0

###_START_OF_HEADER

# Data block for image 1
data_image_1

# Creation data and time 
_audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ????
                                  
# Sample details
_chemical.name_common 'Protein X'

# Experimental details
_diffrn_measurement.method                   Oscillation
_diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ????
                                                  # Needs to be defined

_diffrn_radiation_wavelength.wavelength      0.7653 #   (Angstroms)
_diffrn_source.source                        'ESRF BM-14'
_diffrn_detector.detector                    'ESRF Be XRII/CCD'

# Define image storage mechanism

# ????? These can be looped items for multiple images, but I get the
# impression from mmCIF examples that such data items can also be 
# individually assigned. Is this correct ??????

_array_intensities.array_id           image_1
_array_structure.binary_id            1     # Proposed numerical identifier
                                            # to relate array definition to
                                            # binary section
_array_structure.encoding_type        unsigned_16_bit_integer
_array_structure.compression_type     none
_array_structure.byte_order           little_endian
_array_intensities.linearity          linear
_array_intensities.undefined_value    0
_array_intensities.overload_value     65535

# Define dimensionality and element rastering
loop_
_array_structure.array_id
_array_structure.index
_array_structure.dimension
_array_structure.precedence
_array_structure.direction
_array_element_size.size        # ???? Is this allowable. Here I'm
                                # mixing items from different categories
                                # inside the same loop, to avoid having 
                                # to define the indexes again. Putting
                                # this all in one loop seems best to me.
image_1    1      768    1    increasing    100.5e-6
image_1    2      512    2    decreasing     99.5e-6

###_END_OF_HEADER

###_START_OF_BIN







###_END_OF_BINARY

###_END_OF_CBF



Here the file header is shown again, but this time with many comment
lines added to explain the format:


###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION 1.0

# This line starting with a '#' is a CIF and CBF comment line,
# but the first line with the three '#'s is a CBF identifier.
# The text '###_CRYSTALLOGRAPHIC_BINARY_FILE: VERSION' identifiers
# the file as a CBF and must be present as the very first line of
# every CBF file. Following 'VERSION' is the version number of the
# file. A version 1.0 CIF should be readable by any program which
# fully supports the version 1.0 CBF definitions.

# Comment lines and white space (blanks and new lines) may appear
# anywhere outside the binary sections.
  
###_START_OF_HEADER

# The '###_START_OF_HEADER' identifier defines the start of an ASCII
# header section. This where the details of the image and auxiliary
# information are defined.

# Data block for image 1
data_image_1

# 'data_' defines the start of a CIF (and CBF) data block. We've
# chosen to call this data block 'image_1', but this was an arbitary
# choice. Within a data block a data item may only be used once.

# Creation Data and time 
_audit.creation_datetime '1997-03-27T09:55.05' # ???? Is this correct ????

# Sample details
_chemical.name_common 'Protein X' # The apostrophes enclose the string
                                  # which contains a space

# Experimental details
_diffrn_measurement.method                   Oscillation
_diffrn_measurement.sample_detector_distance 0.15 # ???? New data name ????
                                                  # Needs to be defined

_diffrn_radiation_wavelength.wavelength      0.7653 #   (Angstroms)
_diffrn_source.source                        'ESRF BM-14'
_diffrn_detector.detector                    'ESRF Be XRII/CCD'

# Many more data items can be defined, but the above gives the idea
# of a useful minimum set (but not minimum in the sense of compulsory,
# the above data items are optional in a CIF or CBF).
 
# Define image storage mechanism

# ????? These can be looped items for multiple images, but I get the
# ????? impression from mmCIF examples that such data items can also be 
# ????? individually asigned. Is this correct ??????

_array_intensities.array_id           image_1
_array_structure.binary_id            1     # Proposed numerical identifier
                                            # to relate array definition to
                                            # binary section
_array_structure.encoding_type        unsigned_16_bit_integer
_array_structure.compression_type     none
_array_structure.byte_order           little_endian
_array_intensities.linearity          linear
_array_intensities.undefined_value    0
_array_intensities.overload_value     65535

# Here the size of the image and the ordering (rastering) of the
# data elements is defined. The CIF 'loop_' structure is used to
# define different dimensions. (It can be used for defining multiple
# images.)
loop_
_array_structure.array_id
_array_structure.index
_array_structure.dimension
_array_structure.precedence
_array_structure.direction
_array_element_size.size        # ???? Is this allowable. Here I'm
                                # mixing items from different categories
                                # inside the same loop, to avoid having 
                                # to define the indexes again. Putting
                                # this all in one loop seems best to me.
image_1    1      768    1    increasing    100.5e-6
image_1    2      512    2    decreasing     99.5e-6

# The 'array_id' identifies data items belong to the same array. Here
# we have chosen the name 'image_1', but another name could have been
# used, so long as it's used consistently. The 'index' component refers 
# to the dimension being defined, and the 'dimension' component defines 
# the number of elements in that dimension. The 'precedence' component
# defines which precedence of rastering of the data. In this case the
# first dimension is the faster changing dimension. The 'direction'
# component tells us the direction in which the data rasters within a
# dimension. Here the data rasters faster from minimum elements towards
# the maximum element ('increasing') in the first dimension, and more
# slowly from the maximum element towards the minimum element in the
# second dimension. (This is the default rastering order.)


# The storage of the binary data is now fully defined.

# Further data items could be defined, but this header ends with the
# '###_END_OF_HEADER' identifer.

###_END_OF_HEADER

# Here comments or white space may be added e.g. to pad out the header
# so that the start of the binary data is on a word boundary

# The '###_START_OF_BIN' identifier is in fact 32 bytes long and contains
# bytes to separate the "ASCII" lines from the binary data, bytes to
# try to stop the listing of the header, bytes which define the binary
# identifier which should be set to 1 to match the 'binary_id' defined
# in the header, and bytes which define the length of the binary
# section. In this case the length of the binary section is simply
# 768*512*2 = 786432 bytes (or more, if for some reason the binary
# section is made delibrately bigger than the binary data stored).

###_START_OF_BIN







###_END_OF_BINARY

# The '###_END_OF_BINARY' identifier must occur starting at the first
# byte after the number of bytes defined in the start of binary identifier.
# This may be used to check data integrity. (Following the '###_END_OF_BINARY'
# identifier the file is in "ASCII" mode again, so these comment lines
# are allowed.)


# The '###_END_OF_CBF' identifier signals the end of the CBF file.

###_END_OF_CBF










Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.