Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Report of IUCr Representative to CODATA 2000-2002

  • To: Multiple recipients of list <epc-l@iucr.org>
  • Subject: Report of IUCr Representative to CODATA 2000-2002
  • From: Brian McMahon <bm@iucr.org>
  • Date: Tue, 18 Jun 2002 16:13:42 +0100 (BST)
The Scientific Union Representatives to CODATA are required to submit
a report of the data-related activities of their Unions in the period
between CODATA Congresses. I attach my current draft in case anyone
is interested or wishes to comment (or point out anything vital that I
have overlooked).

Best wishes

                           Report to CODATA
                         of Activities of the
               International Union of Crystallography (IUCr)

The International Union of Crystallography (IUCr) is a scientific union
adhering to the International Council for Science (ICSU). Its objectives
are to promote international cooperation in crystallography and to
contribute to all aspects of crystallography, to promote international
publication of crystallographic research, to facilitate standardization
of methods, units, nomenclatures and symbols, and to form a focus for
the relations of crystallography to other sciences.

Crystallographic Databases
Several independent databases exist that store and manage the results of
crystal structure determinations. Among the most important are 
 - the Cambridge Structural Database for organic and metal-organic
   small-molecule structures and oligonucleotides (CSD);
 - the Protein Data Bank for protein and nucleic acid structures (PDB);
 - the Inorganic Crystal Structure Database for inorganic materials (ICSD);
 - the Metals Crystallographic Data File for metals (CRYSTMET).

Other crystallographic databases store non-structural data, including:
 - the NIST Crystal Data collection of unit-cell parameters;
 - the NIST Biological Macromolecule Crystallization Database and the
   NASA Archive for Protein Crystal Growth Data ;
 - the Powder Diffraction File.

These databases are curated by independent organisations, but the IUCr
monitors their development through a standing Database Committee (CCD) that
reports directly to the Union's Executive Committee. Among the changes
noted during the period by the CCD are:
 - Consolidation of the Protein Data Bank under the management of the
   Research Collaboratory for Structural Biology, with centres at Rutgers
   University, San Diego Supercomputer Center and NIST, Washington DC,
   USA. H. D. Berman is the Director of the RCSB-PDB.
 - Release of new database access and visualisation software within the
   Cambridge Structural Database system. The CSD includes over 250,000
   structures as of October 2001.
 - The Metals and Alloys Data File (CRYSTMET) was brought fully up to date
   in 1999 and Toth Inc. of Canada offer search and visualisation software
   that also operate on data from the Inorganic Crystal Structure Database.
 - H. Behrens has retired as Head of the Inorganic Crystal Structure
   Database and has been succeeded by P. Luksch. There is continuing 
   collaboration with NIST.
 - R. Jenkins has retired as Executive Director of the International Centre
   for Diffraction Data and is succeeded by T. Fawcett. ICDD has released
   its powder files (PDF-4) in relational format, and is collaborating with
   the Cambridge Crystallographic Data Centre to generate a database of
   calculated powder patterns from CSD contents.

A special issue of the IUCr journal Acta Crystallographica was published
in 2002 that describes the current operation of the crystallographic
databases and a selection of research applications.

Data Exchange
Development continues on the Crystallographic Information File (CIF),
the standard file format for archiving and exchanging crystallographic data
developed and adopted by the IUCr in 1991. This exchange standard comprises
printable ASCII files conforming to a simple grammar and populated with
identifiers defined in external machine-readable dictionaries. The
identifiers provide universal labels for well defined items of data, ranging
from atomic coordinates to entire journal research articles. Six
dictionaries (collections of terms specific to particular areas of
crystallography) are now available, covering
 - core data items in small-molecule structural crystallography (coreCIF),
   published 1991, updated 1997, 1999 and 2001;
 - powder diffraction (pdCIF), published 1997;
 - macromolecular structure determination and secondary structure
   characterisation (mmCIF), published 1997 and updated 2000;
 - image-plate data, annotation and analysis (imgCIF), published 2000;
 - crystallographic symmetry specification (symmCIF), published 2001;
 - modulated incommensurate structures (msCIF), published 2002.
Except for the powder dictionary, all these have been published or updated
since the last CODATA Congress in Autumn 2000.

Working groups exist to define data names in other relevant areas of the
subject. Dictionaries currently under development cover the fields of
 - small-angle scattering;
 - magnetic structures;
 - electron density studies.
Coordination of the content of these dictionaries and approval for public
adoption is the responsibility of the IUCr Committee for the Maintenance of
the CIF Standard (COMCIFS).

The coreCIF format is used by the Cambridge Crystallographic Data Centre 
to import structural data from journals and as an export format; mmCIF
provides the data capture and exchange format for files in the Protein
Data Bank. The PDB has recently released software capable of constructing
mmCIF data sets for deposition from legacy software applications, a move
designed to facilitate the direct deposition of such results.

coreCIF is the mandatory submission mechanism for structure reports of
small-molecule and inorganic compounds published in the IUCr journals Acta
Crystallographica Sections C (Crystal Structure Communications) and E
(Structure Reports Online) and is the mandatory mechanism for deposit of
supplementary structural data in other IUCr journals.  Several chemistry
journals from other publishers also require deposits in CIF format.

The IUCr journals staff are working with the PDB to develop an analogous
deposit and publication workflow for high-throughput structural genomics

CIF is a domain-specific data format, but follows good practice in separating
form (the file syntax) from content (the external dictionaries that define
the meanings of the data names or tags used within the file). Such a design
means that interoperability with other data exchange mechanisms is possible
wherever a mechanistic syntax conversion can be performed. The data names
in CIF dictionaries have been used as models for a CORBA description of
macromolecular structure adopted by the Object Management Group, and for
portions of the content of the XML-based Chemical Markup Language
(P. Murray-Rust) which is a candidate as a molecular description language
under consideration by IUPAC.

A particular requirement in interoperable systems is the establishment of
unique identifiers that can locate the same object as stored in different
databases. The IUCr has been involved with the requirements for an
identifier of crystalline phases needed by the IUPAC-CODATA Task Group
on Standardisation of Physicochemical Property Electronic Datafiles
(IUCOSPED; H. Kehiaian), and with the development of a chemical identifier
being undertaken by IUPAC.

An interesting step forward in securing interoperability between data
sources would be the establishment of a metadata standard identifying the
topic areas covered by a scientific or technical database. A well-defined
standard would facilitate the distributed querying of disparate databases.

Data Validation
All structural data sets published in IUCr journals are checked for internal
consistency by software capable of reading CIF submission or deposit files
directly. The results of such checks are made available to the referees of
papers submitted to IUCr journals, and can form the basis for rejection or
a request for revision of the text or even re-refinement of a structure
submitted for publication. The criteria for assessing structural data are
published on the IUCr web pages. Different journals handle the quality
assessment criteria in different ways. In the case of Acta Crystallographica
Section E: Structure Reports Online, which was launched as an online-only
journal in 2001, the output from the validation software is posted as an
accompaniment to each paper on the web.

The IUCr offers similar data validation services tailored according to the
individual requirements of journals from other publishers, and encourages
uptake of this service as a way to improve the overall quality of structural
data in the scientific literature and associated databases.

Electronic Publishing
The IUCr publishes six primary research journals in crystallography, and a
seventh covering the technology, instrumentation and uses of synchrotron
radiation. All are published online and, with the exception of Acta
Crystallographica Section E: Structure Reports Online, also in print.
Crystallography Journals Online is located at http://journals.iucr.org.  A
major digitisation project completed in late 2001 made available PDF page
images of all articles published in IUCr journals since 1948.  Papers
published since 1999 are also, for the most part, available as
navigable HTML versions with internal hyperlinking and links to other
journals. The online service also offers access to published data in CIF
format, allowing visualisation and import into structure refinement
programs; hypertext access to the experimental data sets used in structure
determination experiments; and many other benefits to authors and readers
(e.g. e-mail alerting of tables of contents, manuscript status enquiries,
downloads of proofs and offprints).

Bibliographic data on published articles is uploaded to the CrossRef
publishers' hyperlinking service, allowing references to IUCr journals to be
resolved into HTML pointers to the location of those articles.

Similar links exist from articles to the associated data sets deposited in
some of the structural databases. In the case of macromolecular structures,
links to PDB files exist over the web. For small-molecule entries in the CSD 
the link is currently to a summary page that includes the reference code
needed to access a local copy of the CSD. It is hoped that in future web
links may be established to this database. A pilot web linkage to inorganic
structures in the ICSD is under active devlopment. In each case, reciprocal
links exist from the respective database entries to their associated
articles in Crystallography Journals Online.

While many of these journal/database linkages are at present negotiated and
set up through bipartite contacts, it would be useful to consider the
establishment of a data-centric linking organisation playing a role similar
to CrossRef in the publishing field. Perhaps such a project could be
considered by CODATA.

Web of Information
The web site http://www.iucr.org continues to host news items, bulletin
boards, conference diaries, employment notices, directories of laboratories,
research facilities and individual crystallographers, links to educational
and commercial resources, book notices, project reports and many other items.
The World Directory of Crystallographers has been completely re-engineered
in the last year, and is actively being updated by crystallographers themselves
to record their contact details and professional and research interests.
The Union's Committee for Electronic Publishing, Dissemination and Archiving
of Information acts as the editorial board for the web site.

Long-Term Preservation of Digital Content
The undersigned and H.D. Flack, IUCr Representative to ICSTI, participated
during 2000 in the ICSTI review of the Open Archival Information System (OAIS)
reference model. Elements of this reference model were incorporated in the
drafting of a policy document on the archiving of the Union's electronic
journal content, which was subsequently approved by the IUCr Executive
Committee. As yet the policy is incompletely implemented, but the drafting
exercise underlined the necessity of proper provision for long-term
preservation and access.

Although the IUCr has direct control only over its own publications, it
intends to construct a registry of crystallographic databases that make
adequate provision for long-term preservation and access as identified in
the OAIS model. Such a registry in the field of crystallography would be
complementary to the registry of electronic physics archives envisaged by
IUPAP (http://publish.aps.org/IUPAP/ltaddp_report.html). It is suggested
that CODATA could usefully identify such inititatives on the part of other
of its members, and act as a higher-level registry of registries. The
collation of self-certified OAIS-compliant data providers would raise the
profile of this important issue, and would place on any participating
organisation the onus of publicly declaring and defending their archiving

Brian McMahon

18 June 2002

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.