A million thanks
With the recent inclusion of the one millionth structure in the Cambridge Structural Database (CSD) we wanted to take this opportunity to thank you, the crystallographic community. I also thought it would be fun to reflect on how the structures published in 1965 compare with the structures published on the day we reached a million!
The sharing of one million organic and metal–organic crystal structures is an achievement of which we should all be extremely proud. Crystallographers worldwide have contributed to this achievement and without your efforts to share and publish your data this milestone would not have been possible.
The sharing of the millionth structure comes 54 years after work on the database first began. Its beginnings can be traced back to 1965 and J.D. Bernal and Olga Kennard who had the vision and foresight that the collective use of data would lead to the discovery of new knowledge, and that vision has certainly come to fruition today. Since then many people at the Cambridge Crystallographic Data Centre (CCDC) past and present have contributed to the growth and development of the CSD, scientists worldwide have deposited their data and we have been guided by our Board of Trustees who are also members of the community.
If we look at the structures published this year to date, we can see that the crystallographers responsible for the data hail from over 70 different countries. So far this year China occupies the top spot with over 25% of the structures. China is followed by Germany, USA, India, Japan, UK, Russia, France, Australia and Spain who all make it in the top 10 countries publishing data so far this year.
We can also look to see which crystallographers have published the most data over the years. Over 750 authors have each published more than 500 structures in the CSD. Currently Brian Skelton from the University of Western Australia is our most prolific author with over 5,800 structures. He is closely followed by Allan H. White then Arnie Rheingold, Peter Jones, Mike Hursthouse, Alex Slawin, Frank Fronczek, Judith Howard, Hoong-Kun Fun and Seik Weng Ng. We owe our sincere thanks to these authors for their contributions and to the other 350,000+ authors who have structures in the CSD. Without your efforts the CSD would not be the valuable resource it is today!
Given the prevalence of structures from the country it is perhaps unsurprising that the millionth structure (an N-heterocycle) was from Yao Wang and co-workers at Shandong University, China.
On the same day that we reached one million structures, over 550 other structures were also published in the CSD. These were determined by over 170 different crystallographers so hundreds of you were tantalisingly close to being the author of the millionth! Again, if we look at where in the world these structures came from we can see that China led the way but the top 10 on #CSD1Million day saw China followed by Russia, UK, Germany, USA, Japan, India, Spain, France and Canada. So, it looks like crystallographers in some countries were slightly better at timing their publications to be in with a chance than others! Special mention should also go to Azerbaijan. According to our records there is only one structure published this year in the CSD from a crystallographer in an Azerbaijan and it was on CSD one million day so you nearly timed it perfectly! We also saw a rise in the number of CSD Communications published around CSD one million, as some of you tried to increase your chances of being the millionth author. In fact, 12% of the structures published on the day we reached a million were CSD Communications. Unfortunately for our CSD Communication authors the millionth structure didn’t happen to be from this pool but these otherwise unpublished structures are still a valuable resource for the community and you get a full CCDC citation so don’t let that put you off publishing more! For the curious among you we will be publishing a series of blogs on our website in the coming months highlighting some of the other structures that were published on CSD one million day. Keep an eye on our website and social media posts if you want to see if you got close!
Intriguingly the number of structures published on the day we reached one million is nearly comparable to the number of structures published during all of 1965 when the CSD was first established. In 1965 a total of 655 structures were published and on CSD one million day a total of 588 were published in the CSD. This is an amazing feat and shows just how far crystallography and the CSD have come in the intervening 54 years. As we love looking at the data in the CSD we thought we would see how the structures published in 1965 compare with the structures published in 2019 on CSD one million day. Given the small number of structures we are comparing be warned this is purely for fun and just gives a snapshot!
In 1965 for the structures reporting a value, the average R factor was 6.6% compared with an average of 5.4% on CSD one million day, although for nearly a half of the structures in 1965 there is no R factor available so this may not be a true reflection of the change. We can also look at the study temperature and see that on CSD one million day the average study temperature is 190 K compared to 260 K in 1965 showing the rise in low-temperature studies since the CSD was first established. With so much data encapsulated in one million structures there really are endless statistics and insights to be derived from the data, some more meaningful than others! If you are interested in learning more, we now have a new CSD Statistics and Insights section on our website.
I would like to finish by thanking you for sharing over one million organic and metal–organic structures. We hope that you enjoy looking at the individual structures as much as we do at the CCDC. We also hope you can continue to gain new insights and knowledge from this tremendous wealth of data. We look forward to continuing celebrating this achievement at the ACA, ECA, Fall ACS and AsCA meetings later this year.