Data activities in crystallography

Crystallography is a science that consistently applies best practice to the management of research data - whether raw or processed experimental data or the derived structural models that are stored in curated databases.

Data policy

The IUCr Committee on Data (CommDat) reports and advises on crystallographic data policy, and works closely with IUCr Commissions and with external bodies such as the International Science Council's Committee on Data for Science and Technology (CODATA). Past statements on research data policy (often in the context of scientific publications) include:

Data exchange

The IUCr has since 1991 promoted the Crystallographic Information Framework (CIF), a family of controlled vocabularies, data-name definitions and relationships, and file formats designed to facilitate efficient and accurate interchange and archiving of crystallographic data with all relevant experimental and annotational metadata.

Data validation

The IUCr sponsors checkCIF, a validation service for structural data published in IUCr journals. It also strongly supports the initiatives of curated structural databases to improve the quality of deposited structures.

Initiatives in raw data validation are focusing on complete metadata associated with diffraction images to allow verifiable reuse of primary data sets (see for example the DDDWG Workshop on Metadata for raw data from X-ray diffraction and other structural techniques).

Data publication

IUCr Journals make freely available for every published small-unit-cell structure machine-readable files describing the crystal and molecular structure, including intramolecular geometry, intramolecular interactions and atomic displacement parameters. Also available are the processed experimental data (structure factors or Rietveld profiles) used in the determination of the structure. For macromolecular structures, links are provided to the structural and experimental data deposited in the Worldwide Protein Data Bank. Authors are also encouraged to link to their raw data sets if these are available from a public repository with a persistent identifier [see e.g. Helliwell, J. R. et al. (2019). Findable Accessible Interoperable Re-usable (FAIR) diffraction data are coming to protein crystallography. IUCrJ 6, 341-343).

Reference data sets of use to crystallographers are published in International Tables for Crystallography. A small number of reference data sets are also available on this server.