E0723

WHAT IS A CIF DICTIONARY? Sydney R Hall, Crystallography Centre, University of Western Australia, Nedlands 6907, Australia. (syd@crystal.uwa.edu.au)

In a CIF each data item to be identified by a tag (ie. a unique data name), and specified as a text string (ie. a data value) following the tag. To access a CIF data item one must know about the data name and value in advance - a common requirement for accessing data in any format! The important difference with CIF data is that this information resides in an electronic dictionary which is human and machine parsable. This talk will introduce the CIF dictionary concepts.

Data will be described as "data objects". A data item is not just "number" or "text string" but an entity with properties or "attributes" which collectively define its uniqueness, and its relationship to other items. For example, a common data item in crystallography is the "calculated structure factor Fc". How does one go about defining Fc? It turns out to be a non-trivial task! The most obvious approach might be to use the Fourier transform expression and link its definition to the atom types and atomic parameters. This introduces quite complex relationships with other data items, and these in turn are dependent on data such as the diffraction measurements. In fact, strictly speaking, Fc is related to almost every crystallographic measurement!

A common reaction to such a detailed definition is that "its over the top"! But is it? If one wanted to compare calculated and measured structure factors but only have access to the structural data then must know this information. If a data dictionary contained the "complete" description of Fc is understood by an accessing tool, then missing Fc values could be generated automatically. Fc is a "dependent" data item directly related to other crystallographic information. Other data items are not as dependent. For example, the crystal cell dimensions are often referred to as "primitive" data because they cannot be easily derived from other data (other than diffraction angles and indices, that is!). These aspects will be explained in the talk.

The second part of this talk will describe the format and syntax of the electronic dictionary. This is referred to as the "dictionary definition language: (DDL). The DDL1 specifications have been published[1], a newer version DDL2[2] is already in use for the mmCIF dictionary. A DDL dictionary file contains definition data which conforms to the STAR File syntax[3]. Each definition is composed of a sequence of DDL data items referred to as DDL attributes. The attributes are the vocabulary of the dictionary language, and, individually and collectively, they provide the semantic tools of the dictionary. The description and rationale of the DDL will be described.

1 Hall, S.R. & Cook, A.P.F. STAR Dictionary Definition Language: Initial Specifications. J. Chem. Inf. Comput. Sci. 1995, 35, 819-825.

2 Westbrook, J.D. STAR Dictionary Definition Language: Extended Syntax. To be presented in CIF-II.

3 Hall, S.R.; Spadaccini, N. The STAR File: Detailed Specifications. J. Chem. Inf. Comput. Sci. 1994, 34, 505-508.