
Feature article
IUCr Data Matters!
Figure adapted from W. L. Bragg, “The Structure of Some Crystals as Indicated by Their Diffraction of X-rays”
CommDat is the IUCr standing committee that provides a coordinating and advisory role regarding data. It follows up on the work of the IUCr Diffraction Data Deposition Working Group (DDDWG, 2011–2017), which explored the deposition of primary diffraction images and mechanisms enabling their retrieval by others for purposes such as reanalysis, software and methods development, validation and review. This working group recommended that authors should provide a permanent and prominent link to the raw data sets underpinning their article, thus promoting open science.
CommDat took up a prominent role to integrate data-related issues into the programme of the IUCr 2026 congress in Calgary with great enthusiasm through sponsoring, co-sponsoring or supporting a range of Commissions. As a result, a dominant theme running through the conference is ‘data’ - from management & format, through open science to application of AI/ML.
Before the Congress even gets underway, there is the chance to attend an innovative summer school! The inaugural ‘AI in Crystallography’ Summer School is being held close to Calgary in nearby Lethbridge (August 8th-11th). Its format is a mixture of informative lectures, supplemented by hands-on exercises that will cover everything from fundamental AI, through data preparation to training and applying models.
There are a record number of workshops at this Congress – many of them relating to data. In particular, CommDat is organising a 1-day workshop entitled From Open Data to Global Open Science (Workshop 012). Is there a multitude of open science initiatives, or is open science a clearly structured landscape? This is your chance to learn more! Open Science aims at making research results and the underlying data openly available to the larger community via data sharing. This requires the data to adhere to the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles. This workshop will explore the fact that while Open Data systems are developing at national and regional levels, there is a need for more coordination at the disciplinary (ie crystallography!) level.
Furthermore, some of the conference microsymposia address Open Science issues and the FAIR data principles. For open science to thrive, open data needs to be produced, managed, and made available in a coordinated manner, which is driving a growing need to present standardised data and metadata accurately, following modern best practices and adhering to the FAIR principles. These topics will be covered in MS027: Coordinated Efforts Towards Open Data for Global Open Science, MS045: Open data, standards, and repositories for small- and wide-angle X-ray and neutron scattering data in the era of big data and AI advancements.
There are also a range of other data-related workshops and microsymposia to attend, including Workshop 007: Enhancing PDB Deposition and Validation, Workshop 009: Reproducibility in Practice, Workshop 016: Mastering the CSD; Enabling PDB deposition of Integrative/Hybrid methods structures, Workshop 019: DIALS for processing diffraction images, MS076: Integrative Structural Chemistry and Biology: The Power of Combining Data, MS059: Using crystallographic databases for data exploration, MS-058: Best practices for X-ray, neutron and electron data processing and reuse, MS-059: Using crystallographic databases for data exploration and exploitation.
The CommDat Keynote, entitled ‘Coordinated efforts towards open data for global open science’ will be given by Bridget Murphy from Kiel University (also DESY and DAPHNE4NFDI). In addition, there is a Keynote by Suzanna Ward from CCDC entitled ‘The Cambridge Structural Database: A catalyst for solving scientific challenges’.
The importance of Artificial Intelligence and Machine Learning techniques in crystallography has a longer history than one might think. A comprehensive overview is provided in two recent IUCr journals virtual issues: “Machine learning in crystallography and Structural Science” by Simon Billinge and Thomas Proffen (2024), and “Artificial neural networks: application in X-ray photon science and crystallography” by Tomas Ekeberg (2024). A more global application of AI and ML can only be achieved if data are archived and made available in a FAIR manner. Think of AlphaFold2 (Jumper et al., 2021), which is based on over 100 thousand unique protein structures available in the PDB at that time. Learn about the latest developments in: MS-013 Big data and crystallography, MS-050 Machine learning in low-resolution crystallographic computing, MS-102 Large-scale structure determination initiatives in the era of AI and ML, MS-124 Applications of ML and AI in biological small-angle scattering, MS-130 AI & ML in crystallography and scattering, MS-134 Crystallography and X-ray Absorption Spectroscopy Machine Learning methods for the development of materials. Cutting-edge developments in data-driven approaches for X-ray and neutron science, with a focus on innovations in data acquisition, processing, and analysis, will be addressed in MS128: Data-Driven Science: Current Status and Outlook.
Crystallography has a long data-sharing tradition when it comes to our derived models and structure factors, as well as pioneering validation methods like checkCIF and PDB validation reports. Significant progress has been made in sharing raw diffraction data and deploying checkCIF for raw data tools (see below at Raw Data Letters). The latest advancements are revealed in: MS-012: Next-generation validation metrics for macromolecular structures, MS-040: Advanced data analysis and refinement for electron diffraction data, MS-098: Data quality and error analysis for diffuse scattering.
Fully in focus with the data sharing tradition, our peer-reviewed open-access data publication IUCrData, has launched a new section – Raw Data Letters, a collaborative innovation between IUCr Journals and CommDat. Quoting from the editorial: the new section will publish short descriptions of crystallographic raw data sets from X-ray, neutron or electron diffraction experiments, in the biological, chemical, materials science or physics fields, and provide a persistent link to the location of the raw data…..Raw Data Letters support Open Science policies: no research data should be lost, but should be made available to the research community according to the FAIR principles, for which the correctness and completeness of the metadata are crucial, and these will be central to the reviewing process. A publication in Raw Data Letters describes interesting features of the data that were observed but not accounted for in an already published structure, or even for an unsolvable structure that would challenge method developers. In the paper, a persistent link (DOI) must be provided to the location of the raw data in a permanent repository (see Figure for the front page of the first Raw Data Letter showing the links in the grey box). However, raw data come in many different data formats and varying metadata quality. To ensure reusability, metadata should be accurate and complete, and at least sufficient. For single-crystal data, a list of core metadata has been prescribed (Kroon-Batenburg et al., 2022), and their inclusion is a key requirement for correct reprocessing of the data, even more so if automation is a concern. This list is a superset of the NeXus/HDF5 NxMx Gold Standard that was developed by Bernstein et al. (2020). Metadata is captured in imgCIF format (Bernstein & Hammersley, 2005), which is well known to the community via its CBF variant, and already includes the appropriate data names for the core metadata list. To support this method of publication, the imgCIF_creator tool (Kluyver et al, 2026) has been developed to read a collection of raw data files, produce a description of the associated metadata, including references to the locations of the raw data files, and write these in an ‘imgCIF’ file. The imgCIF file provides data in CIF format, which is both machine-readable and human-readable, with a comprehensive set of tags (defined in the imgCIF dictionary) for describing detector geometries and other experimental parameters, thus facilitating FAIR ‘re-usability’. A checkCIF for raw data (Hester, 2026) has been established to validate the completeness and correctness of the metadata. The workflow of the IUCrData submission system, with imgCIF_Creator and checkCIF for raw data tools, is show in the Figure. The archived raw data, imgCIF metadata, and ultimately the published Raw Data Letter, are linked via DOIs.
Recently, it seems that everywhere we look, both in our professional and personal lives, data is the latest trend and a currency that drives everything we do. Clearly, this is also the case in the IUCr at our Congress and with our latest journal! However, let us not forget that over a century ago, one of our founding fathers was already thinking in this way: in his 1913 foundational article, Lawrence Bragg (W L Bragg 1913) published his raw diffraction images, Laue diffraction photos from his alkali halide crystals, with which he showed how he deduced the “first crystal structure”!

References
Billinge, S. J. L., Proffen, T., Acta Cryst, A80 & 139-145 (2024).
Ekeberg, T. (2024). J. Appl. Cryst. 57, 1–2.
Hester, J.R. https://github.com/jamesrhester/imgCIFHandler.jl (2026).
Kroon-Batenburg, L. M. J., Helliwell, J. R., Hester, J. R., IUCrData 7 & x220821 (2022).
Copyright © - Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited.






