A draft specification for CIF is now available for community review at
the URL

This page provides links to two documents, describing respectively the
syntactic and semantic components of the "Crystallographic Information
File". To focus subsequent discussion on this list, I would invite you
first to review and comment on the syntax specification at
since this describes the syntax rules that all parsers must follow; some at
least of the semantic content can be tailored to the needs of individual
applications. However, the semantics document is also posted for

One point should be made carefully: this specification is for an extended
version of CIF, not yet formally adopted by COMCIFS. The only significant
extensions to the existing standard are: restriction of the line-length
constraint from 80 to 2048 characters, and the introduction of matching
square brackets as additional delimiters for string values containing white
space. It should be possible easily enough to reverse-engineer this
specification to generate a complete specification for the existing standard
(version 1.0). The reason that we have done things in this somewhat inverse
way is that no two existing CIF parsers behave identically in the handling
of the more subtle allowed syntactic features. Hence every existing CIF
parser will need to be examined and in principle modified if it is to be
fully compliant against version 1.0; it is therefore an opportune time to
signal additional changes that would be necessary for version 1.1

Note further that the extensions are important for the reading of CIFs;
applications that write CIFs will not need to be changed at all (provided
that they currently write valid CIFs): a CIF that is valid against version
1.0 will necessarily be valid against version 1.1.

Some general comments:

CIF is intended as an archival and portable format. For this reason, the
description of certain syntactic features has been constructed with care to
try to avoid machine or operating-system dependencies. This is particularly
the case with the discussion regarding end-of-line delimiters. Here an
attempt has been made to reconcile the practical handling of files which are 
transported or shared across common operating systems such as Unix, MacOS
and MSWindows with the more general formulation that is required to support
files on mainframe or elderly record-oriented OS architectures.

There are similar concerns with the specification of character sets. Despite
the growing utility of Unicode, maximum portability across platforms is
achieved by specifying a very precise (if restrictive) set of characters. In 
this document, they are expressed by reference to the ASCII character set,
but the wording is such as to permit use of the same characters under an
EBCDIC or other encoding scheme.

There has also been extended internal discussion on the line-length
limit. The view of COMCIFS is that it would be desirable in principle to
drop any limit on the length of a text line, but that practical
implementation limits in certain systems still argue in favour of a finite
length. The limit of 2048 is arbitrary, but is intended to address the most
common reason for violation of the existing 80-character limit, which is
manual editing in a GUI window (especially using a proportionate font) that
typically overruns by only a few characters. Applications will still be
encouraged to write within or near to 80-character lines where possible.

Such considerations will rarely trouble a developer on a single platform;
but applications that expect to handle files under different machine and OS
architectures will need to shoulder the responsibility of managing any
necessary underlying record or byte manipulation to preserve the integrity
of the files on the target systems.

A formal grammar is presented for CIF using BNF-style notation. CIF however
has a context-sensitive grammar which is not amenable to description purely
in terms of a BNF. The specification therefore contains careful commentary
and prescriptions for lexical analysis that must be read and implemented
very carefully. This also accounts for the extended nature of certain of the 
language productions, where context-sensitive handling is required by
declaring elements on both the left-hand and right-hand sides of productions 
that must match.

Where differences of substance exist between this formulation and Nick's
formerly-posted BNF, the differences may legitimately be described and
discussed on this list. However, the intention is to move towards formal
adoption of this specification as the reference standard.

The numbering of paragraphs is for ease of labelling and has no deeper
purpose. If this draft needs to be changed during this cycle of review I
shall add or delete paragraphs without disturbing the existing numbering.
Paragraphs in smaller font are intended to provide additional commentary to
the main text, but again no deep significance should be attached to the use
of small or larger type.

Brian McMahon                                             tel: +44 1244 342878
Research and Development Officer                          fax: +44 1244 314888
International Union of Crystallography                  e-mail:  bm@iucr.org
5 Abbey Square, Chester CH1 2HU, England                         bm@iucr.ac.uk

