Crystallographic data

International FAIR Convergence Symposium 2020

CODATA/GoFair 2020

This conference was jointly organised by CODATA and Go FAIR, focussing on cross-domain data integration topics with a view to helping tackle cross-disciplinary global challenges as described by the UN's sustainability development goals. The weblink for the conference is  https://codata.org/events/conferences/international-fair-convergence-symposium-convened-by-codata-and-go-fair-22-23-october-2020-paris-france/. The dates were shifted from October to late November/early December; the weblink presumably had been registered and so still shows the original dates. The format was virtual due to the COVID-19 pandemic as declared by the World Health Organisation in March 2020. There were four primary themes: i. Crisis reduction and response (learning from COVID-19); ii. FAIR specifications; iii. FAIR society; and iv. Data stewardship (training and career opportunities).

The opening plenary session of 90 minutes was entitled "International Open Science in 2020". There were three 15 minute presentations by Shamila Nair-Bedouelle (Assistant Director General for Natural Sciences, UNESCO), Peter Gluckman (President Elect of the International Science Council, and based in New Zealand) and Jean-Claude Burgelman (Professor of Open Science Policies and Practices, Free University of Brussels). The commonality of these was that open science should apply to publications, data/metadata and research workflows. The discussion that followed, included two additional panellists: Geoffrey Boulton, Past President of CODATA and Ana Persic, Acting Chief of Science Policy and Partnerships at UNESCO; as well as 195 attendees for the session. The debate from the audience included three emphasised points: the pros and cons that FAIR data does not include data quality; the global north/south divide; and many early career researchers seem to wish to keep their data closed. As IUCr Representative to CODATA I mentioned two points via the 'chat': i. I submitted the weblink for our IUCr Response to Open Data in a Big Data World (https://www.iucr.org/news/press-releases/open-data ) presented at International Data Week in 2016 in Denver, where we focussed on the importance of data quality in crystallography; ii.  I pointed participants to the fact that central synchrotron/laser X-ray and neutron facilities, whilst increasingly having raw data access policies, allow a three-year embargo period to protect the proposing team and very importantly their PhD students. I did not make a third point, as the science policy makers I have found these last ten years, are simply not receptive to it,  that funded research typically involves at best 20% of all research proposals made, i.e. 80% or more fail to get funded. Therefore such unfunded research, when it does get finally done and published, relies on journal subscribers even if it must then lie behind a paywall. [I explain the issue in depth, including a proposed solution of an open access publications fund for unfunded research, in Chapters 33 and 34 of my book Skills for a Scientific Life (2017) CRC Press.]

The poster session comprised about forty posters. These were on largely technical topics such as data stewardship and interoperability. There was a particularly interesting poster on Edinburgh University’s support to academic and research staff on preparing data management plans and their own research data archive. My discussions with the poster presenter led to my learning of the data training programme at Edinburgh University https://mantra.edina.ac.uk.

On day 2 there was a Plenary Session on Open Science Clouds in Europe, China (Global), Africa, Australia. These were works in progress. Interestingly there was a stated wish by delegate Bob Hanisch, based at NIST, for a USA Open Science Cloud. This was followed by another Plenary session on Cross Domain Integration and included the Digital Representation of Units (DRUM) CODATA effort. This led to the sharing of several weblinks on topics such as: Access involving consent rule: FAIR does not equal open. See e.g. https://www.mitpressjournals.org/doi/full/10.1162/dint_a_00027

A view from the banking environment of FAIR is given in this document: The Annodata Framework: Putting FAIR data into practice (Deutsche Bundesbank Technical Report 2019-03).

Bob Hanisch as Chairman of the DRUM CODATA Working Group made an open call to the Scientific Unions to put forward use cases (of difficulty); the IUCr Representative to the DRUM Working Group is the Chairman of the IUCr Nomenclature and Units Committee (Prof. Carol Brock). Simon Cox of the Australian CSIRO showed International Geophysical Union’s example of the aeons in pdf format being converted to digital representation.

On Day 2 there was a session on FAIR workflows which was excellent. The full slide pack of all presentations is here.

This session on FAIR workflows unfortunately clashed with the FAIR Implementation Profiles (FIP) session. But I had attended the preparatory workshops in the previous month. These were rather intricate, good for communities beginning their data archiving perhaps but of little obvious relevance to crystallography with its mature databases (CSD, PDB, ICDD, COD, ICSD summarised recently in Bruno, I., Gražulis, S., Helliwell, J. R., Kabekkodu, S. N., McMahon, B. and Westbrook, J. (2017), Crystallography and Databases. Data Science Journal, 16, p. 38). However the topic of FAIR Implementation I think is relevant to our IUCr Commissions’ implementation of the IUCr DDDWG Final Report recommendations with raw data archiving opportunities; IUCr Executive Committee could use the FAIR Implementation Profiles terminology to ask IUCr Commissions about their progress of their individual raw data FIPs.

The final session, day 4, was a Plenary on disaster risk reduction with a special focus on data sharing in tackling COVID-19. This followed on nicely from the day 3 session on cross-domain implementation for patient data. My input to the latter was that a descent from overviews had to be made so as to go into the actual contents of individual patients' record cards, which is needed to understand fully the pharmaceutical chemical data of particular compounds, i.e. drugs. These record cards will also need clarity on COVID-19 vaccination record details per patient. Cross-domain integration must also involve country by country integration of data and records. This is potentially complex, as emphasised in the USA National Academy of Sciences Workshop on the Future of Data Science,  where concerns from the USA point of view are the EU''s GDPR (General Data Protection Regulations) legislation. Furthermore China's firewall barriers extend not only to internet coverage, which is restricted, but also apparently to data.

The whole event was very well done and interesting, with potentially useful new things we could take up within IUCr as indicated above (i.e. with FAIR workflows especially). Video presentations have been posted on the CODATA Vimeo Channel.

Emeritus Professor John R Helliwell DSc
IUCr Representative to CODATA