IUCr activities

CIF news 4

The Crystallographic Information File (CIF) and its macromolecular extension (mmCIF) have become foundations for two key crystallographic resources, Acta Crystallographica C and the Protein Data Bank (PDB). The Powder Data File and the Inorganic Crystal Structure Database are also in the process of adopting CIF to represent experimental and structural information. The modelling and NMR communities plan to use CIF. CIF permits detailed description of each aspect of an experiment and its results in a computer readable form.

Although the structure of CIF is essentially very simple, the details can prove confusing to new users. Three of the most frequently asked questions are:

1. How are CIF data names chosen?

Each data name in CIF has to be unique so that a computer knows exactly what the data value represents. This leads to names, such as _atom_site_fract_x, that are composed of several familiar words or their abbreviations. Since a data name cannot include blanks (these are used to terminate the name), the individual elements are linked by underlines. Any character string starting with an underline could serve as a data name, but the names have been chosen so that users can see the underlying structure of the file: the names of related items start with the same string of characters. These relationships are coded into the dictionary in a rigorous way.

2. Are there user interfaces for CIF?

The web based input tool used by the Protein Data Bank is actually an editor for mmCIF, the macromolecular extension of CIF. This tool allows users to type in the required information and provides simple pull down menus for many of the data items. An editor for the core CIF used by Acta Crystallographica C is in preparation.

3. I am writing software and need an item not defined in any CIF dictionary. What can I do?

In the short-term you can define local data names for items and include them in a CIF. The data item must follow the CIF syntax and must not duplicate an existing CIF data name. This can be achieved by incorporating a distinctive string (such as the name of the laboratory) into the data name. One of the CIF syntax rules is that a program should ignore any data item whose name it does not recognize. Thus the presence of a local data name will not invalidate the CIF and will not prevent it being read by other programs.

If your programs or files are distributed to other laboratories, you may wish to register a local prefix with COMCIFS, the committee responsible for the maintenance of the CIF standard. This prefix would appear as the first element in any data name defined in the local dictionary. If the prefix is registered, a user can then track down the local dictionary. In the future, applications will automatically be able to locate a registered local dictionary on the web and transparently concatenate it with the CIF dictionary currently being used. To register a local dictionary name contact Brian McMahon at bm@iucr.org.

If the new data item is likely to be of more general interest, it should be added to one of the CIF dictionaries. In this case, you should contact the appropriate Dictionary Management Group (David Brown at idbrown@mcmaster.ca for the core dictionary, Helen Berman at berman@rcsb.rutgers.edu for the macromolecular dictionary and Brian Toby at brian.toby@nist.gov for the powder dictionary). They will be able to advise on the procedure for creating a new CIF data name. Contact David Brown or Brian McMahon if you are not sure which dictionary is appropriate.

I.D. Brown