Bookmark and Share

Celebrating CIF in practice

Serena C. TarantinoMichele ZemaBrian McMahon

The first Crystallographic Information Fiesta (CIFiesta), an international crystallography school of the Commission on Teaching of the Italian Crystallographic Association (AIC), was celebrated last 29 August to 3 September 2019 in Naples.

A fiesta is 'an event marked by festivities or celebration' (lexico.com), often characterised by energy, enthusiasm and joyful passion. So how on earth does it come to be appended to CIF, the acronym for that most dry of topics, the Crystallographic Information Framework? The School took on the challenge of using CIF to demonstrate the centrality of good practice in data management to the proper conduct of structural science. Wide in scope - from the collection and evaluation of experimental data through the publication of research findings to the dissemination of structural results in crystallographic databases - the School aimed to teach young researchers how (and why) to abide by its motto - 'know your data ... trust your data ... share your data'.

The venue was Naples, that gorgeous Mediterranean city in the shadow of Mount Vesuvius. The lecturers and tutors were from the local Institute of Biostructure and Bioimaging of the CNR, the wider ranks of the AIC Commission on Teaching, major structural databases, and representatives of no fewer than seven IUCr bodies.* The students (47% female and 53% male) were from locations as diverse as Italy, Albania, Algeria, Belgium, Benin, Brazil, Jordan, Morocco, Peru, Poland, Russia, Switzerland, UK and Uruguay.

To provide context, Simon Hodson, Executive Director of CODATA, gave an opening Keynote describing CODATA's efforts to promote Open Science through the FAIR principles (that data should be findable, accessible, interoperable and reusable). Increasingly, grant proposals require an account of the scientist's proposed data management plan, and the growing demands from policymakers and from journals that scientific results should be reproducible highlight the responsibility of the scientist to analyse data critically, make it available to peer reviewers and subsequent researchers, and ensure that methodology and computational methods are published or described in full. Loes Kroon-Batenburg demonstrated how crystallography was fortunate in handling well defined and accurately measured data of many types (raw, processed and derived), yet pointed out the opportunities for coming to wrong or imperfect conclusions - through error, inexperience or, in the worst case, dishonesty. The importance of permitting critical reanalysis of experimental data in any assessment of a scientific result was emphasised in a video contribution by John Helliwell, Chair of the IUCr Committee on Data, who also highlighted the benefits of archiving raw experimental data towards this end. Brian McMahon, of COMCIFS and CommDat, then gave a historical account of the development of CIF, the IUCr's data characterisation and exchange standard, and showed how the large and growing number of CIF dictionaries covered all aspects of crystallographic research. It was under the umbrella of data definitions covered by the CIF programme that the students would spend the rest of the School solving, refining and ultimately publishing structures.

[group photo]A collage of images from the School. Clockwise from top left: Simon Hodson delivers the opening keynote lecture; Luigi Vitagliano opens the structure solution tutorial to the macromolecular stream; Tom Blanton delivers the ICDD database tutorial; Loes Kroon-Batenburg delivers a plenary lecture; students grapple with a tutorial exercise; a team of students presents the results of their tutorial.

Uniquely (to our knowledge) the School had the ambition to generate primary research publications through collaborative work involving tutors and students on unpublished data sets. Following plenary lectures on data collection, evaluation and interpretation (James Hester, Loes Kroon-Batenburg), symmetry (Mois I. Aroyo) and the CIF paradigm (Brian McMahon, James Hester), the students split into two groups for tutorial sessions, one looking at a zinc coordination polymer, the other at conformational aspects of Thermotoga maritima arginine-binding protein (TmArgBP). In both groups, unpublished data sets were examined and critically assessed.

The small-unit-cell-parameter tutorial was designed by Michele Zema and Serena C. Tarantino in such a way that three teams of students each worked on a different XRD dataset that had been previously collected by Chiara Massera on the same crystal under different conditions. The coordination polymer structure was solved and refined using the software package Olex2, with expert guidance from the package maintainer Horst Puschmann, with Larry Falvello completing the team of tutors. The data analysis of the macromolecular structure, planned and coordinated by Luigi Vitagliano, Luciana Esposito and Rita Berisio, was conducted using EVAL15, again with the expert help of its author, Loes Kroon-Batenburg.  

As with any real data set, limitations, imperfections and peculiarities provided opportunities to deepen the students' understanding of the actual physical structures giving rise to the experimental data. The highlight of the School was the students' presentations, in groups, of their own analyses and interpretations of the data and the structural models derived from it. Following the successful structure determinations, students had the opportunity to work with some of the IUCr journals' publication tools in the course of preparing the articles for submission, notably checkCIF for the pre-submission validation of the polymer, and publBio, a CIF-related authoring tool for the protein structure. The articles were accepted for publication in IUCrData and Acta Cryst. F, respectively [1, 2], in the case of the polymeric structure just in time for the Closing Ceremony.

[publication accepted]At the Closing Ceremony, Michele Zema announces that the first CIFiesta article is accepted for publication in IUCrData. The reaction of the students is enthusiastic.

Following the challenging, stimulating and perhaps exhausting tutorial work (with students working late into a Sunday evening on their presentations), the final section of the School consisted of more hands-on tutorials and lectures in the processes of deposition and use of three major structural databases (Cambridge Structural Database, Protein Data Bank and the Powder Diffraction File). Once again there was a strong focus on data quality, for it is only in having large quantities of trustworthy results that the databases can fulfil their objective of driving further scientific discovery on the basis of their accumulated knowledge.

The aim of any School is to teach its students to become better scientists. In this respect, we have tried to provide a rigour and quality of teaching that is only to be expected from the AIC and IUCr. However, we have also tried to provide a slightly different perspective from most established Schools, one that links together the traditional hierarchy of components of understanding: data, information, knowledge and wisdom. In the School prospectus, we wrote that 'crystallographic information is the component that bridges the gap between raw experimental data and the global knowledge bases represented by the curated structural databases. It is typically thought of in terms of publications in the scientific literature, but in fact is more than that. It includes the characterisation and understanding of the raw data collected in an experiment, an account of the methods applied to process and analyse those data to infer a structure, and a quantification of the extent to which features of the model should be trusted or treated with caution. In an era when crystal structure determination is considered routine and is often largely automatic, a critical appreciation of the quality of the results is increasingly important. This school will teach students to respect their raw data, extract the most reliable information they can, and disseminate that information in a complete and verifiable manner. In this way they will contribute to the sum total of scientific knowledge with rigour and integrity.' The prospectus ended with the disclaimer 'crystallographic wisdom is outside the scope of this course'. Watching the students progress through the course of the week, from their first exposure to the huge amount of care that must go into proper interpretation of experimental data (something often overlooked in the easy operation of modern instrumentation), through their growing awareness of what made one data set more reliable and usable than another, to their confident presentation of their structural investigations, we wonder if that last sentence should be omitted from future editions of such a School!

And did the event live up to its promise of a celebration? Let's hear what the students had to say ...

"When I arrived I knew nothing and when I left I knew enough!"
"CIFiesta was a great experience. I learned so many things. It was a pleasure."
"A fantastic learning experience, spanning basic to high-level theory. I really enjoyed the school."
"Stressful but fantastic experience!"
"The experience was positive in all aspects. The choice of alternating theory and practice was really effective! The printed booklet with all slides should be exported to all schools!"
"I really appreciated the tutorials, and even more the informal atmosphere which plays a great role in growing interesting discussions."
"The availability of school computers with pre-installed programs helps to pay full attention to the tutorial, not to handling software and hardware. Besides, the helping hand of several tutors' assistants contribute to the success of tutorials."
"I found the AICS2019 CIFiesta very successful, the academic activities were so rich and the computer-based tutorials have been very interactive. My experience with the school was AMAZING and I wish to take part in the next coming AIC schools."
"The school was excellent, the level was just right for me and for many of the students I have spoken with, practical sessions were short but super useful. All lecturers managed to generate an excellent environment of comfortability for discussions even though there was such an age (and ergo expertise) gap. Overall I loved it, so congratulations!!"
"This school was very useful for me. I learned many new things and deepened aspects that I already knew. The origin of students from different countries has made it even more fascinating. Congratulations on the organization."
"AICS2019 CIFiesta is very well organised. Thanks to the organising committee. I thank the IUCr for the financial support to allow my participation. I promise to be an expert in crystallography."

[articles]The first pages of the two articles published in IUCrData and Acta Cryst. F based on the activities of the tutorial sessions (small-unit-cell-parameter and macromolecular streams) conducted at the CIFiesta.

[1] L. Falvello, P. Lotti, C. Massera, S. C. Tarantino, M. Zema, H. Puschmann, M. Y. Agbahoungbata, J. Andreo, S. A. Sahadevan, A. Bismuto, G. Bonfant, S. A. S. Bonou, C. Carraro, M. D. Zotti, A. di Biase, R. Fantini, I. Ferraboschi, J. M. F. Custodio, M. Frigerio, G. Gallo, S. Gjyli, M. Goudjil, F. Igoa, E. Kahveci, M. Kalienko, S. Lorenzon, L. Macera, J. J. M. Fajardo, E. Nushi, S. Ouaatta, E. Parisi, L. Pasqualetto, E. Pesko, G. Pierri, R. Pinalli, R. Poppe, A. Santoro, E. Smirnova, S. Sorbara, L. Tensi and G. Tusha (2019). IUCrData, 4, x191222.
[2] G. Smaldone, A. Ruggiero, N. Balasco, A. Abuhammad, I. Autiero, D. Caruso, D. Esposito, G. Ferraro, E. L. M. Gelardi, M. Moreira, M. Quareshy, M. Romano, A. Saaret, I. Selvam, F. Squeglia, R. Troisi, L. M. J. Kroon-Batenburg, L. Esposito, R. Berisio and L. Vitagliano (2019). Acta Cryst. F75, 707-713.


* Lecturers included representatives of COMCIFS (the IUCr Committee on the Maintenance of the CIF Standard), CommDat (the Committee on Data), and the Commissions on Journals, International Tables, Mathematical and Theoretical Crystallography, Magnetic Structures and (ex officio) Crystallographic Nomenclature.

14 September 2019