IUCr journals

Research news

Why keep the raw data?

[Raw data]

The increasingly popular subject of raw diffraction data deposition is examined in a Topical Review in IUCrJ [Kroon-Batenburg, Helliwell, McMahon & Terwilliger (2017). IUCrJ, 4, 87-99; https://doi.org/10.1107/S2052252516018315]. Building on the 2015 workshop organised by the IUCr Diffraction Data Deposition Working Group (DDDWG), the authors bring the story up to date with accounts of new subject-specific and institutional data repositories, and of growing policy pressures on research data management such as the European Open Science initiative.

The article is, however, more than just a workshop report or a survey of evolving policy. It seeks to inform the cost-benefit arguments over diffraction data deposition with examples from real front-line research. For example, Kroon-Batenburg and Helliwell have collaborated on studies of protein binding of the chemotherapeutic agent cisplatin, and have made all their 34 raw data sets available through the U. of Manchester Data Library. Some of these data sets have been reanalysed and resulted in fresh understanding of cisplatin-lysozyme models.

The prospect of extracting further information from archived primary data sets in this way (either by the insights of fresh pairs of eyes or through subsequent improvements in software analysis) has implications for structural databases, facilitating the idea of continuous improvement of studies, such as for macromolecular structure models (long championed by Terwilliger).

It is not only in the field of macromolecular structure determination that these considerations are important. One of the greatest challenges to reusing any raw data is the need for complete metadata associated with any raw data set, to allow its subsequent interpretation and full evaluation.

Various IUCr Commissions are actively publishing their summaries of the essential metadata that need to be captured alongside all experimental data sets. These initiatives and their relationship to the IUCr's standard for data characterization (CIF, the Crystallographic Information Framework) are reviewed within the article. Again, practical pointers are given to essential metadata that need to be captured alongside diffraction data sets.

While there are encouraging signs that the scientific community is taking more informed interest in data management and its scientific potential, fresh challenges are being thrown up by the latest generation of instrumentation, capable of generating vast amounts of data at an incredible rate. It may not be possible to archive or even thoroughly analyse all the data that is being produced. However, this article will help to supply a deep understanding of the reasons why society should invest effort and resources into extracting the greatest value possible from the data deluge, in crystallography as in any science.

Further reading: a Scientific Commentary on the above article and a related Editorial have been published in the same issue of IUCrJ [Grabowski & Minor (2017). IUCrJ, 4, 3-4; https://doi.org/10.1107/S2052252516020364 and Baker (2017). IUCrJ, 4, 1-2; https://doi.org/10.1107/S2052252516020340, respectively].

A step to understanding polymorphs

[Acta B] Crystal structure of the Z′ = 56 polymorph of 1,3,5-tris(4-carboxyphenyl)benzene (overhead view of the hexagonal sheets, shifted with respect to each other); reproduced with permission from C. A. Zentner et al. (2015). Chem. Commun. 51, 11642-11645. Copyright Royal Society of Chemistry.

In a paper published in Acta Cryst. B [(2016). 72, 807-821; https://doi.org/10.1107/S2052520616017297], Carol Brock of the U. of Kentucky looks at some of the organizing principles behind crystal structures with high Z′, where Z′ is loosely the number of symmetry-independent molecules in the asymmetric unit. This study lies at the very heart of understanding and being able to control properties of molecular structures. Pharma and agrichem industries attach great importance to understanding crystal structure. The solid form impinges directly on properties such as solubility, bioavailability, processing characteristics, bulk density, dissolution rate, permeability, surface electrostatic charge and so on, so it is imperative to have a clear understanding of the molecular-level make-up of a material and how this affects its properties. This study illustrates that the high Z′ phenomenon, like polymorphism itself, has many root causes but careful study of each structure allows the identification of organization principles in most cases.

Brock leaves the door open for future research saying 'very few structures are so complex that it is difficult to understand how the crystals could have formed'. This comprehensive survey, in conjunction with a groundswell of work by a number of groups on this increasingly intriguing problem over the past 20 years, shows that there is no one-size-fits-all explanation and that the details of each structure are uniquely tied to the chemical details of the molecules that comprise it. The search goes on, but perhaps we are now at least beginning to know how to formulate the question.¨

Taken from a Scientific Commentary by Jonathan W. Steed [Acta Cryst. (2016). B72, 805-806; https://doi.org/10.1107/S2052520616018734].