- To: "Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)" <comcifs@iucr.org>
- Subject: Annual Report for 2005
- From: David Brown <idbrown@mcmaster.ca>
- Date: Thu, 20 Apr 2006 11:26:41 -0400
- Reply-To: "Discussion list of the IUCr Committee for the Maintenance of the CIF Standard (COMCIFS)" <comcifs@iucr.org>
COMCIFS Annual Report for 2005 to the IUCr Executive Committee
This year marks the fifteenth year since the Union adopted CIF
(Crystallographic Information Framework, formerly Crystallographic
Information File) as a standard for submission of crystal structure
reports to the Union journals. Much has happened in that time and the
IUCr Congress in Florence provided an opportunity for COMCIFS to take
stock of the project and plan its future directions.
The most notable achievement of the past fifteen years has been the
preparation of an impressive array of CIF dictionaries that provide
data-names and definitions for the two thousand or so crystallographic
terms that can appear in CIFs. No other discipline has a comparable set
of dictionaries with such a wide community acceptance. These
dictionaries are used in conjunction with the STAR file syntax as the
format for the considerable archive of CIF-based structure reports. In
the field of small-cell crystallography CIF is now widely accepted as
the standard for the submission of structure reports to many scientific
journals, and for their archiving and downloading. In the
macromolecular field CIF is used to archive the Protein Data Bank, but
it does not yet have as wide community acceptance, most protein
structure laboratories preferring to stay with the familiar, if
inadequate, PDB format, and the macromolecular data centres favouring
the use of XML.
XML is a markup language with many functional similarities to the STAR
file structure used by CIF. Although a recent arrival, its development
by the information technology community has earned it widespread
acceptance in many scientific communities. It is more flexible than
CIF, though this is not necessarily an advantage in an established field
like crystallography. It allows users to develop their own semantics
and define concepts in ways that may not be compatible with those
defined by other users. Although XML users have access to an extensive
suite of programs to manipulate their files, unless they agree on the
semantics, i.e., the definitions and organization of the concepts of
their discipline, they are unable to communicate with each other. CIF's
suite of dictionaries provides a widely accepted semantic for
crystallography which can be translated into an XML format for the
benefit of XML users, though the reverse process is only possible if the
XML file is written in a form designed to be compatible with CIF.
COMCIFS is working to ensure that the information contained in CIFs and
CIF dictionaries is available in XML format. Some conversion programs
are already available and more work is planned.
Our goal is to enable CIFs to be read by generic programs that obtain
all their crystallographic knowledge directly from the CIF
dictionaries. This requires that all CIFs rigorously conform to the
standard. In the early days this standard was not strictly enforced so
as to avoid discouraging those who found CIF strange and unfamiliar, but
over the years the degree of conformity has been steadily increased and
the CIF standard itself has evolved in subtle ways as we became more
aware of the possibilities inherent in the STAR syntax. Thus after
preparing the coreCIF dictionary as a STAR file using the Dictionary
Definition Language 1 (DDL1) it was decided that the macromolecular CIF
dictionary should use advanced features that were only available in
DDL2. The result was two incompatible CIF dialects, CIF1 and CIF2,
using dictionaries based on DDL1 and DDL2 respectively. This required
different programs for each dialect, or a duplication of effort to
ensure that a single program could read both. While this decision made
sense at the time, it has returned to haunt us as we strive to ensure
that we retain compatibility between the CIF1 and CIF2 definitions even
as the dictionaries evolve independently.
The problem of CIF dialects was discussed in Florence at the closed
COMCIFS meeting. Here we developed a consensus that we should move
towards a new dictionary language, DDL3, with corresponding CIF3
dictionaries. Programs designed to work with CIF3 dictionaries would be
fully back-compatible and able to read any file written in either CIF1
or CIF2. A prototype has already been tested and an early approval of
DDL3 will allow the conversion of the existing CIF1 and CIF2
dictionaries to CIF3. The opportunity is being taken to incorporate
advanced features that were unimagined fifteen years ago. One of these
is the development of an hierarchy of crystallographic concepts that
would add flexibility and allow the dictionaries to evolve in parallel.
Another innovation is the introduction of algorithms that instruct a
program how the value of an item can be calculated on the fly from other
items present in a CIF. These algorithms are computer readable
definitions that will enhance the ability of CIF dictionaries to serve
as machine-readable repositories of crystallographic knowledge.
While these activities help to keep CIF at the forefront of information
technology, COMCIFS is also concerned not abandon those who find
themselves still challenged by the demands of checkCIF. From the
beginning we knew that we would need a suite of tools to assist in
preparing CIFs. The last couple of years has seen the appearance of a
number of such programs, e.g., enCIFer, publCIF and CIFedit, that use
the appropriate CIF dictionaries to assist users in writing fully
conformant CIFs. PublCIF has been developed by the IUCr editorial
office and is well-tuned to the publication requirements for small-cell
structures. It will continue to be developed to handle macromolecular
structure reports that are accompanied by structural data in mmCIF
format, as the editorial production processes develop to handle such
articles efficiently. Other tools are under development in an
IUCr-sponsored project to upgrade some older CIF software to strict
compliance with the latest CIF specifications. This project includes
updates to vcif, a simple syntax checker, and to CIFtbx, a Fortran
library; and the provision of a utility to manage the relaxation of the
line and data name length restrictions in CIF version 1.1. As the
existing dictionaries are converted to DDL3 we will encourage the
preparation of CIF3-level programs that will be able to read any CIF
whether written as CIF1, CIF2 or CIF3. We expect, however, that the
existing dictionaries will continue in use until the advantages of CIF3
become sufficiently apparent that users voluntarily convert.
Among the routine business transacted during the course of the year were
the preparation of new terms of reference expanding the mandate of
COMCIFS to ensure that crystallographic information in digital form is
compatible with standards in neighbouring fields. These terms were
subsequently approved by the Executive Committee. COMCIFS also formally
adopted responsibility for the maintenance of the DDL1 dictionary which
had no organization designated to authorize and approve necessary
changes. Finally, a complete documentation of CIF concepts and
associated data dictionaries has been completed as Volume G of the IUCr
International Tables series.
I. David Brown
Chair
begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://scripts.iucr.org/mailman/listinfo/comcifs
Reply to: [list |
sender only]