IUCr activities

CIF news 5

[I. D. Brown]Most crystallographers are familiar with the CIF (Crystallographic Information File) dictionaries. Those working with small structures have used the core dictionary to write CIFs submitted to Acta Cryst. B, C or E. Those in the macromolecular field will be acquainted with the macromolecular dictionary used to submit protein structures to the Protein Data Bank, and those working with powders will know of the dictionary being adopted for the Powder Data File.

Several groups have been developing dictionaries for other fields of crystallography. These are now in the process of being approved by Comcifs, the IUCr committee charged with approving all changes and additions to the CIF standard. One of the more interesting dictionaries that has recently been approved is designed for storing multidimensional images produced by 2-dimensional detectors. Synchrotron laboratories produce large numbers of these images for transfer to other laboratories for processing.

Two-dimensional detector images are produced as large binary files. Unfortunately, the CIF standard is restricted to ASCII characters since ASCII is the most robust standard. An image can only be stored in a CIF if it is converted to ASCII. There are several standards for conversion, the most familiar being MIME, the conversion used to prepare binary files as attachments for transmission by email. Because of the large number of images that a synchrotron source can generate, it was necessary to devise a standard as part of the CIF suite that allows binary image files to be transmitted without conversion. Such a file cannot be a CIF since a CIF must be written in ASCII. The solution is to define a binary file that looks as much like a CIF as possible.

The new dictionary defines two standards, an imgCIF format in which the image is first converted to ASCII characters, and a Crystallographic Binary File (CBF) identical to the imgCIF except that the image field itself is written in binary. The two files map directly onto each other, the only conversion being of the image itself from binary to ASCII and vice versa. Even though everything in a CBF except the image is written in ASCII, the fact that the image is in binary makes the whole file a binary file and therefore not a CIF. This allows an image to be read and manipulated in the form of a binary CBF. The CBF can be converted to an imgCIF if, for example, it is to be archived, but in any case the additional information needed to characterise the image (details of the sample, the experimental arrangements and the mode in which the detector was read) can easily be extracted and incorporated into a CIF reporting the results of the study. The imgCIF is a fully conforming CIF and everything except the image conforms fully to the CIF standard.

This project was initiated by Andy Hammersley and has been brought to completion by Paul Ellis, Bob Sweet, Herbert Bernstein and others. Software is being written to store and process images in imgCIF/CBF and synchrotron users can expect to find themselves using these files soon. The full text of the dictionary can be found at: www.bernstein-plus-sons.com/software/CBF/doc/cbfext98.html.

I. David Brown