The CIF syntax
The archival facilities provided by the STAR process are general and open-ended. There is no restriction on the number of loop levels, the length of the file records (i.e. the lines of data) or on the length of data names. Syntax of this generality is unlikely to be needed in crystallography. It was therefore considered reasonable to impose restrictions on the STAR syntax which will simplify the software required to generate or access a CIF.
The CIF restrictions to the STAR File syntax are:
1. Lines (physical records) may not exceed 80 characters.
2. Data names and block codes may not exceed 32 characters. All data names and block
codes are case insensitive, i.e. _ABS and _abs are treated identically.
3. In a STAR File, a data item may be of any data type. However, it simplifies processing if
data types are known in advance. The CIF Dictionary identifies whether a CIF data item is number or character. The character and text fields are considered interchangeable.
4. A data item is assumed to be a number if it starts with a digit "0"-"9", plus "+", minus "-"
or a period "." and it is not bounded by matching single or double quotes or semicolons as the first character on a line.
5. A number may be supplied as an integer, as a floating-point number, or in scientific
notation. When concatenated with an integer in parentheses, that integer is assumed to be the estimated standard deviation in the final digit(s) of the number. For example: 34.5 3.45E1 34.5(12) 3.45E1(12) are all versions of 34.5 with and without an e.s.d. of 1.2.
6. A data item is assumed to be of data type text if it extends over more than one line, i.e. it
starts and ends with an semicolon as the first character of a line.
7. A data item is assumed to be of data type character if it is not a number or text.
8. Only one level of loop_ is permitted. Additional levels of repeated data must be stored as
lists within a text field.
9. Many numeric fields contain data for which the units must be known. Each CIF data
item has a default units code which is stated in the CIF Dictionary. If a data item is not stored in the default units, the units code is appended to the data name. For example, the default units for a crystal cell dimension are Angstroms. If it is necessary to include this data item in a CIF with the units of picometres, the data name of _cell_length_a is replaced by _cell_length_a_pm. Only those units defined in the CIF Dictionary are acceptable. The default units, except for the Angstrom, conform to the SI Standard adopted by the IUCr. These default units should be used whenever possible.
Each data item in a data block is identified by a unique data name. The currently accepted CIF data names are listed and defined in the CIF Dictionary (Core Version 1991). These are the IUCr "standard" data items currently accepted for the submission of machine readable documents to the IUCr and to the crystallographic databases. The data items in the Core Dictionary are intended primarily for use in the description of most small-molecule and inorganic structures. Future extensions to this dictionary will define data items used in more specialised areas of crystallography, such as powder diffraction and macromolecular studies.
The Dictionary (Core version 1991) is also available as an electronic file cifdic.C91. This file has been constructed using the STAR Dictionary Definition Language (DDL) proposal of Cook (1991). Entries in this Dictionary may be examined using the online utility cman(1). The CIF application program cyclops (1) employs this dictionary for the validation of standard data names.
Frank H. Allen, Crystallographic Data Centre, University Chemical Laboratory, Lensfield Road, Cambridge CB2 1EW, England
I. David Brown, Institute for Materials Research, McMaster University, Hamilton, Ontario L8S 4M1, Canada