(IUCr) CIF: APPENDIX I. CIF Dictionary (Core Version 1991)

Acta Cryst. (1991). A47, 655-685

APPENDIX I

CIF Dictionary (Core Version 1991)

1. Introduction

This version of the CIF Dictionary contains the detailed definitions of data names which are acceptable in submissions to the IUCr and to the crystallographic databases. Data names are considered to be case insensitive: they may be given in upper- or lower-case letters, or in any combination of upper and lower case. The data name definitions are ordered alphabetically by the data category; general notes on these categories are given in Section 2 of this Appendix.

Certain abbreviation conventions have been adopted in the CIF Dictionary when referring to groups of data names. Use of only the _<category>_ or _<category>_<topic>_ components of a data name, while retaining the trailing underline character, refers to a category or subcategory of data names. For example, _refln_ refers to all data items which have data names starting with this text string. Another commonly used abbreviation replaces the leading components of a data name with an asterisk. This provides a convenient shorthand method for referring to specific members of a category of data names. For example, when discussing data items in the _chemical_formula_ category, one can refer simply to the *_moiety and *_sum items rather than the full data names. This abbreviation aids in the identification of individual data names.

Literature references that are required for the definition of a data item are included in full within the Dictionary, in order that it can be distributed as a standalone document.

The CIF Dictionary contains information about the permitted units for numerical data items. Default units do not require any extensions to be appended to the data name. These defaults, except for the a˚ngström unit, conform to the SI standard adopted by the IUCr. Default units should be used wherever possible; they must be used in submissions to Acta Crystallographica.

Simple typesetting conventions have been adopted for use with CIF data. These are listed in a table below. These conventions are particularly important in text submissions to Acta Crystallographica. The list will be extended as need arises and reported in future versions of the Dictionary and in the Notes for Authors which will be published annually in Acta. Typesetting signals are important in the free-text fields, such as _publ_section_, and also in fields such as _chemical_name_ or in the construction of atom labels for certain classes of compounds, e.g. amino acids and peptides.

Greek letters have been assigned a single-character ASCII alphabetic equivalent. As far as possible, this is the first letter of the fully spelled name of each Greek letter. The exceptions are marked * in the list below. Greek letter codes are preceded by a backslash `\'; lower-case Greek letters use the code in lower case, upper-case Greek letters use the code in upper case.

\a \A alpha
\b \B beta
\c \C chi
\d \D delta
\e \E epsilon
\f \F phi *
\g \G gamma
\h \H eta *
\i \I iota
\k \K kappa
\l \L lambda
\m \M mu
\n \N nu
\o \O omicron
\p \P pi
\q \Q theta *
\r \R rho
\s \S sigma
\t \T tau
\u \U upsilon
\w \W omega *
\x \X xi
\y \Y psi *
\z \Z zeta

superscripts Csp^3^ for Csp³

subscripts U~eq~ for U_eq

acute accent \'e for é

grave accent \`a for à

circumflex \^e for ê

cedilla \,c for çla;

umlaut \"u for ü

degree 120\% for 120°

ångström 1.54\%A for 1.54 Å

< hr>

2. CIF data categories

`_audit_` data names

The _audit_ data items provide a record of the CIF creation and subsequent updating. These items usually precede all others in the CIF.

`_atom_` data names

The _atom_ data names are in two separate categories: those that describe atom sites in a crystal structure (i.e. _atom_site_ data names), and those that describe the properties of the atom types that occupy these sites (i.e. _atom_type_ data names).

The _atom_type_ data provide information on the chemical identity, scattering factors, atomic radii and so on. The _atom_site_ items describe specific information on atomic sites such as positional coordinates, atomic displacement parameters, magnetic moments and directions, and so on.

The _atom_type_ data are global. They apply to one or more atom sites. The link to the atom site data is provided through the data names _atom_type_symbol and _atom_site_type_symbol. These items provide the common character codes which identify atom types. Normally these codes are element symbols but they can include the oxidation state or any other information that uniquely identifies the atom types present in the structure.

If the _atom_site_type_symbol data is supplied in an atom site list, it must match with one of the _atom_type_symbol codes. Alternatively, if the _atom_site_type_symbol is not supplied, the leading characters of the _atom_site_label must match with one of the _atom_type_symbol codes. Note that the _atom_site_type_symbol has precedence over the _atom_site_label for the purpose of linking with atom type data and, if the former is present, the latter need not contain an atom type code. The rules for specifying the _atom_site_label are given in Section 4. When several atom species share the same site, as is commonly found in mineral structures, two different approaches may be used. Atom types may be defined separately with unique symbol codes. A multiply occupied atom site is then specified as two or more atom sites with the same coordinates but different _atom_site_type_symbol (or _atom_site_label) codes. With this approach _atom_site_occupancy values must add up to unity or less. The alternative approach is to specify an _atom_type_symbol to identify the properties of the combined atomic species sharing the site. In this case only a single entry for each atom site is needed.

`_cell_` data names

These data specify the cell parameters, together with the method of measurement, experimental conditions, etc.

`_chemical_` data names

These data specify the composition and chemical properties of the compound. The formula data items must agree with those that specify the density, unit-cell and Z values.

The following rules apply to the construction of the data items _chemical_formula_analytical, *_structural and *_sum. For the data item *_moiety the formula construction is broken up into residues or moieties, i.e. groups of atoms that form a molecular unit or molecular ion. The rules given below apply within each moiety but different requirements apply to the way that moieties are connected (see _chemical_formula_moiety).

1. Only recognized element symbols may be used. The symbol D is used for deuterium.

2. Each element symbol is followed (without a space) by an integer or decimal `count' number. A count of `1' may be omitted.

3. A space or parenthesis must separate each element symbol and its count from the next element symbol.

4. Where a group of elements is enclosed in parentheses, the multiplier for the group must follow the closing parentheses. That is, all element and group multipliers are assumed to be printed as subscripted numbers. An exception to this rule exists for *_moiety formula where pre- and post-multipliers are permitted for molecular units.

5. Unless the elements are ordered in a manner that corresponds to their chemical structure, as in _chemical_formula_structural, the order of the elements should be: C, H, followed by the other elements (including deuterium) in alphabetical order of their symbol. This is the Hill system used by Chemical Abstracts. This ordering is used in *_analytical, *_sum and within the molecular units of *_moiety.

`_chemical_conn_` data names

The _chemical_conn_ data items specify the 2D chemical structure of the molecular species. They allow a 2D chemical diagram to be reconstructed for use in a publication or in a database search for structural and substructural relationships.

The chemical connectivity specification uses two related lists of looped data. These are the atom list and the bond list.

The atom data items provide information about the chemical properties of the atoms in the structure. In cases where crystallographic and molecular symmetry elements coincide it must also contain symmetry-generated atoms, so that the _chemical_conn_ data items will always describe a complete chemical entity. The bond data items specify the connections between the atoms in the atom list and the nature of the chemical bond between these sites.

`_computing_` data names

These items identify the computer programs used in the crystal structure analysis.

`_database_` data names

These codes are assigned by database managers and should only appear in a CIF if they originate from this source.

`_diffrn_` data names

These items record details of the diffraction data and its measurement.

`_exptl_` data names

These items record experimental measurements on the crystal, such as shape, size, density etc.

`_geom_` data names

These data items provide information on the molecular and crystal geometry, as calculated from the contents of the _atom_, _cell_ and _symmetry_ data. Geometry data is therefore redundant in that it can be calculated from other more fundamental quantities in the CIF. It serves, however, the dual purpose of providing a check on the correctness of both sets of data, and of enabling the most important geometric data to be identified for publication by setting the *_publ_flag.

Geometry data types that are not defined explicitly in the CIF Dictionary may be entered as _geom_special_details.

`_journal_` data names

These are the book keeping entries used by the journals staff when processing a CIF submitted for publication. Normally the creator of a CIF will not specify these data. The data names are not defined in the Dictionary because they are for journal use only.

`_publ_` data names

These items are used when submitting a manuscript to a journal for publication.

`_refine_` data names

These items describe the structure refinement parameters.

`_refln_` and `_reflns_` data names

These names specify the reflection data used to determine the _atom_ data items. They exist in two categories: _refln_ and _reflns_ items. The _reflns_ data specify the parameters that apply to all reflections. The _refln_ data refer to individual reflections and must be included in looped lists. The _reflns_ data are not looped.

`_symmetry_` data names

These items specify the space-group symmetry.

3. Standard codes table

Recognized codes are provided for specific data items. Definitions of the codes are included within the CIF data name descriptions of Section 5. Only the codes shown there may be used. If one of these codes does not adequately identify the condition of a parameter, include this information in another data item (e.g. in *_special_details fields). For convenience we list here those data names for which standard codes are provided.

_atom_site_calc_flag
_atom_site_refinement_flags
_atom_site_thermal_displace_type
_atom_sites_solution_hydrogens
_atom_sites_solution_primary
_atom_sites_solution_secondary
_chemical_conn_bond_type
_diffrn_refln_scan_mode
_diffrn_refln_scan_mode_backgd
_exptl_absorpt_correction_type
_refine_ls_hydrogen_treatment
_refine_ls_matrix_type
_refine_ls_structure_factor_coef
_refine_ls_weighting_scheme
_refln_observed_status
_refln_refinement_status
_symmetry_cell_setting

4. Atom label definition

The _atom_site_label is the unique identifier of a specific site in the crystal structure which contains a particular atomic type or combination of atom types. There may be more than one _atom_site_label referring to the same position in the crystal structure. This is one approach to specifying shared atom sites. The other is to specify a single site containing a mixture of atom types in a fixed proportion defined by _atom_type_description.

The _atom_site_label may be constructed from up to seven distinct components, 0 to 6. These components are concatenated in sequential order from left to right. The _atom_site_label must contain a component 0 code. All other components are optional. Components 0 and 1 are concatenated; all other components are joined by an underline `_' character. These underlines must be included up to the highest-order component present (i.e. if a lower-order component is omitted the `_' separator must still be inserted in order to maintain the component ranking). An underline character can never be used within a component code itself.

For most applications component 0 of the atom label is a code that identifies the `type' of atom, or atoms, at the atomic site. It must therefore match one of the specified _atom_type_symbol codes in the _atom_type_ list. However, if the data item _atom_site_type_symbol is also specified, component 0 of the atom label is not used to identify the atom type and it may contain any code which is consistent with the construction rules cited below. In other words, the _atom_site_type_symbol, if specified, takes precedence over the _atom_site_label_component_0 code in the role of linking the _atom_site_ list to the _atom_type_ list.

The _atom_site_label construction is flexible, visually decipherable and well suited to computer applications. The components can be easily identified and stripped with a single pass, from left to right, along the label string. Note that the underline separators are only used if higher-order components exist. If intermediate components are not used they may be omitted provided the underline separators are inserted. For example the label `C233__ggg' is acceptable and decodes as the components 0: `C', 1: `233', 2: ` ', and 3: `ggg'. There is no requirement that the same number of components be used in each label.

Components of _atom_site_label

Component 0: [optionally identical to an _atom_type_symbol] (mandatory)

A character string containing any character except a blank or an underline, with the proviso that each digit `0'-`9' be used only to designate an oxidation state and, as such, must be followed by a plus `+' or a minus `-' character. It is recommended that the element symbols be used when applicable. Permissible codes are:

  Cu  Cu2+  dummy  Fe3+Ni2+  S-  H*  H(SDS)

Component 1: [atom number code] (optional)

This string may contain any alphanumeric character except a blank or an underline `_' but the first character must be a digit `0'-`9' and the second character may not be a plus `+' or a minus `-'. It is intended primarily to differentiate sites containing the same atom type, but can be used for any purpose whatsoever. This string is concatenated directly with the _atom_site_type_symbol. Examples of combined component 0 and 1 codes are:C1 C103g28 Fe3+17b H*251 boron2a Ni22+ Ni2+2 Fe2+Ni2+2, where the component 0 is italicised to indicate how these labels are parsed.

An underline character is inserted if components beyond 1 are included in the label.

Component 2: [identifier code] (optional)

This string may contain any character except a blank or underline. It is intended primarily to identify specific structural information in a macromolecular fragment, but may be used for any other purpose as well.

An underline character is inserted if components beyond 2 are included in the label.

Components 3-6: [residue, sequence number, chain-order, alternate codes] (optional)

These strings may contain any character except a blank or an underline.

Underline characters are inserted after each component, 3 to 5, included in the label.

5. CIF data name definitions

The CIF definition of each data item in this dictionary contains: (a) an identifying data name, (b) a data type code, (c) a description, (d) optional parameters, and (e) optional example(s).

(a) The data name appears at the top of the definition in a bold typewriter face.

(b) The data type code appears to the right of the data name in italics and is bounded by parentheses. The possible type codes are `char' and `numb'. The `char' code signals that the data item may be represented by either a single-line character string bounded by matching blanks, single quotes or double quotes, or multi-line text bounded by a semicolon as the first character of the bounding lines. The `numb' code indicates that the data item is a number in integer, decimal or scientific notation.

(c) The description of the data item appears immediately below the data name in roman type. The description indicates the purpose of the data item and its relationship to other data items. References to the original definition of the data item are provided where appropriate.

(d) Parameters which specify the way in which the data item may be used follow the description, in a smaller roman typeface. These parameters appear in the definition as standard descriptive phrases. The meanings of these phrases are given below.

``Appearance in list: [yes|no|both].'' specifies whether the data item may be included in a repeated list of data items (that is, as a member of data item(s) preceded by a `loop_' command). Note that `both' refers to a data item which is normally a single value but in special circumstances may also appear in a looped list.
``If looped, [_data_name] must be present in the same list.'' specifies another data item that must appear in the same looped list in order that the currently defined item may be correctly accessed.
``Where no value is given, the assumed value is `[*]'.'' specifies the string `*' which is assumed to be the default entry when this data item is absent from the CIF.
``The permitted range is [min]->[max].'' specifies the minimum and maximum numbers permitted for this data item.
``E.s.d. expected: [yes|no].'' specifies whether an estimated standard deviation value, bounded by parentheses, is expected to be concatenated to a numerical data item.
``Default e.s.d. value: [n].'' specifies the assumed estimated standard deviation `n' when a value is not appended to the data item.
``The units extensions are: ` ' ([default units] *1.0) `[_ext]' ([alternative units ] [*|/|+][con]).'' specifies permitted units for dimensioned quantities. The first entry gives the default units which do not require a data name extension or a conversion factor. The second and succeeding entries specify the data name extension, the alternative units and the factor `con' needed to convert these units into the default. The symbol `*' signals a multiplication; the symbol `/' a division; and the `+' an addition of the value `con'.

(e) Examples of a data item may follow the parameter information in a typewriter face. Note that commas are used to separate different examples of the data item. A string containing blanks and bounded by quotes represents a single example. Sometimes an explanation of the example is provided in parentheses.

Back to title page

On to APPENDIX II. Extract from a publication CIF

IUCr Webmaster

APPENDIX I

CIF Dictionary (Core Version 1991)

1. Introduction

2. CIF data categories

_audit_ data names

_atom_ data names

_cell_ data names

_chemical_ data names

_chemical_conn_ data names

_computing_ data names

_database_ data names

_diffrn_ data names

_exptl_ data names

_geom_ data names

_journal_ data names

_publ_ data names

_refine_ data names

_refln_ and _reflns_ data names

_symmetry_ data names