[IUCr Home Page] [CIF Home Page]

Acta Cryst. (1991). A47, 655-685

The CIF syntax

The archival facilities provided by the STAR File process are general and open-ended. There is no restriction on the number of loop levels, the length of the file records (i.e. the lines of data) or on the length of data names. Syntax of this generality is unlikely to be needed in crystallography. It was therefore considered reasonable to impose restrictions on the STAR File syntax which will simplify the software required to generate or access a CIF. The advantages offered by these restrictions were considered to be sufficiently important from a computing standpoint to compensate for a loss of generality in file attributes not critical to crystallography.

The CIF restrictions to the STAR File syntax are:

1. Lines may not exceed 80 characters.

2. Data names and block codes may not exceed 32 characters. All data names and block codes are case insensitive, i.e. _ABS and _abs are treated identically.

3. In a STAR File, a data item may be of any data type. However, it simplifies processing if data types are known in advance. The CIF Dictionary identifies whether a CIF data item is a number or a character. The character and text fields are considered interchangeable.

4. A data item is assumed to be a number if it starts with a digit `0'-`9', plus `+', minus `-' or a period `.' and it is not bounded by matching single or double quotes or semicolons as the first character on a line.

5. A number may be supplied as an integer, as a floating-point number, or in scientific notation. When concatenated with an integer in parentheses, that integer is assumed to be the estimated standard deviation in the final digit(s) of the number. For example: 34.5, 3.45E1, 34.5(12), 3.45E1(12) are all versions of 34.5 with and without an e.s.d. of 1.2.

6. A data item is assumed to be of data type text if it extends over more than one line, i.e. it starts and ends with a semicolon as the first character of a line.

7. A data item is assumed to be of data type character if it is not a number or text.

8. Only one level of loop_ is permitted. Additional levels of repeated data must be stored as lists within a text field.

9. Many numeric fields contain data for which the units must be known. Each CIF data item has a default units code which is stated in the CIF Dictionary. If a data item is not stored in the default units, the units code is appended to the data name. For example, the default units for a crystal cell dimension are ångströms. If it is necessary to include this data item in a CIF with the units of picometres, the data name of _cell_length_a is replaced by _cell_length_a_pm. Only those units defined in the CIF Dictionary are acceptable. The default units, except for the ångström, conform to the SI Standard adopted by the IUCr. These default units should be used whenever possible.

Although the CIF data name and block code definitions are restricted to 32 characters, this is adequate for the construction of self-explanatory names. Data names defined for use in a CIF are separated into components to represent an internal hierarchy of data categories. The concept of data name categories is not explicit in the STAR File process, but it arises naturally as part of data name design. Thus data names of the form _<category>_<topic>_<subtopic> provide for hierarchical classifications and are used throughout the CIF definitions. Sorting on the basis of hierarchical names generates a logical ordering for data names in the Dictionary.

Certain abbreviation conventions have been adopted in this paper, and in the CIF Dictionary, when referring to groups of data names. Use of only the _<category>_ or _<category>_<topic>_ components of a data name, while retaining the trailing underline character, refers to a category or subcategory of data names. For example, _refln_ refers to all data items which have data names starting with this text string. Another commonly used abbreviation replaces the leading components of a data name with an asterisk. This provides a convenient shorthand method for referring to specific members of a category of data names. For example, when discussing data items in the _chemical_formula_ category, one can refer simply to the *_moiety and *_sum items rather than the full data names. This abbreviation aids in the identification of individual data names.

Back to title page

On to CIF Dictionary

Copyright © 1991 International Union of Crystallography

IUCr Webmaster