Big data leads the way for structural chemistry

[CSD growth]
Growth of the CSD since 1972. The red bar shows structures added annually. Date: 6 June 2019.

On 6 June 2019, the Cambridge Crystallographic Data Centre (CCDC) announced a huge milestone for structural chemistry with the addition of the millionth structure into the Cambridge Structural Database (CSD).

The CSD is the world's repository of highly curated experimentally determined organic and metal–organic crystal structures. It is used globally by scientists in over 70 countries to understand how molecules behave and interact in three dimensions in the solid form and ultimately how this affects physical properties.

As the interest in ‘Big Data’ continues to grow in a time where machine learning and automation are changing the way pharmaceutical, agrochemical and many other industries work, reaching such a significant milestone is a huge achievement for the CCDC and the wider scientific community that contribute to and rely on this resource.

Large volumes of data such as this enable scientists to generate more replete answers from a more complete and diverse volume of information, ensuring confidence in the insights being drawn from the data. Furthermore, CCDC’s focus on ensuring the integrity of the data within the CSD through stringent quality assurance and control steps adds even more value and confidence that scientists are obtaining the highest quality information to inform their research.

This rich data resource, alongside advanced search, 3D data mining, analysis and visualisation software from CCDC, enables scientists from both industry and academia to further their research and predict new outcomes. In addition, knowledge derived from the CSD underpins computational chemistry and molecular modelling and is relied on by industry for the development and manufacturing of new drugs and within academia to teach chemistry.

Jürgen Harter, CEO of CCDC, commented ‘This is truly an important milestone not only for CCDC but also for the wider scientific community. In addition to the value that lies in large sets of data like this to help scientists inform their research and decision making, we also pride ourselves on the high quality of the data, a result of the hard work of our expert in-house database team. Maintaining a policy of strict data interrogation ensures the value of the plentiful insights that can be drawn from the CSD, avoiding misinformation that can lead to wasted time, resources and ultimately cost.’

The 1,000,000th structure is an N-heterocycle produced by a chalcogen-bonding catalyst activating multiple reactions steps sequentially. In the paper the authors describe a class of extraordinary chalcogen-bonding catalysts that enable the assembly of discrete small molecules leading to the construction of N-heterocycles in a highly efficient manner. The structure was determined by Yao Wang and co-authors from Shandong University in China and published in the Journal of the American Chemical Society (JACS).

‘We’d like to congratulate Yao Wang and all of his co-authors, for publishing the millionth structure and we are so grateful to the 350,000 plus scientists from around the world that have contributed their data, enabling us to reach this milestone and placing the CSD as the go-to resource for structural information within the scientific community’, said Suzanna Ward, Head of the CSD.

Dr Wang commented ‘We are delighted to hear that our structure [1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one; CSD Refcode XOPCAJ] is the millionth structure to enter the CSD! We have used the CSD for over 10 years because it is an excellent platform to report new crystal structures and an outstanding database to find inspiring chemical structures. It is a valuable resource to us and to many other scientists around the world so we are very proud to be associated with this milestone for the community.’

Peter Stang, Editor-in-Chief, JACS, said 'We are delighted to hear that the millionth structure in the CSD was published in JACS. We know our readers value the CSD as a trusted repository of structural data and some of our authors have demonstrated how this rich resource can accelerate scientific research. Our continued collaboration with the CCDC helps make this wealth of data more accessible to the community as well as helping us ensure the integrity of data published in our journals and we are proud to be associated with such a significant milestone in structural chemistry.'

When asked what’s next for the CSD, Dr Harter commented that although the use of the CSD in the pharmaceutical and agrochemical industries is already well-established, it is now fast becoming a fundamental resource for research into new materials such as batteries, paints, pigments and dyes, and in particular the development of gas storage frameworks and tailored catalysts. As environmental contamination and sustainability become increasingly important there is considerable potential on a global scale.

CCDC has noted a consistent rise in deposits from research taking place in China over recent years. 'It is an exciting time for life science and materials development research with markets such as China leading the way in scientific discovery. We are excited to see what insights we obtain from this market going forward,' Dr Harter commented.

CCDC also has plans to further draw on insights and trends from the data to inform the direction of future research across different industries.

This report is taken from a press release issued by the CCDC. For more information, see here or contact Lucy White. See also Suzanna Ward's article in this issue of the IUCr Newsletter.

26 June 2019

Copyright © - All Rights Reserved - International Union of Crystallography