MX raw image data formats, metadata and validation
14 August 2021 Online, as from Prague, Czech Republic
A tutorial workshop was held under the auspices of the IUCr Committee on Data to introduce the community to the Gold Standard for macromolecular diffraction data.
Despite the drastic changes in arrangements necessitated by the impact of the pandemic, including a decision by the United States to severely discourage physical travel to Prague, and difficulty for UK scientists and scientists from other countries in arranging travel to Prague, the move from hybrid to pure virtual was successful.
The workshop had twenty-four participants in total with approximately eighteen of them active at most times. Aaron Brewster of LBL presented truly impressive progress in use of the Gold Standard NeXus/HDF5 data format in high-data-rate processing for XFEL experiments worldwide in his talk on "Using the Gold Standard for data archival at kilohertz speeds". Herbert J. Bernstein presented a brief tutorial on the Gold Standard in "MX raw data formats and the Gold Standard". Max Burian and Diego Gaemperle spoke of Dectris' efforts to adopt the Gold Standard and raised the wonderful possibility of going open source on their software in their presentation, "Stream2 and FileWriter2". Filip Leonarski presented an impressive talk on "Jungfraujoch: A Data Acquisition and On-the-fly Analysis System for High Data Rate Macromolecular Crystallography" and raised the question of needing to revisit the decision to use LZ4 and suggested consideration of Zstandard (https://github.com/facebook/zstd), the LZW-family compression supported by Facebook. Natalie Johnson reprised her talk from last year on "Synchrotron Data in the CSD". There were three facilities reports: one from NSLS-II from Dale Kreitler of BNL given by Herbert J. Bernstein, one from the Australian Synchrotron given by Daniel Eriksson, and one on Max IV given by Oskar Aurelius.
There was then vigorous discussion for over an hour. The major points raised were:
- Clemens Vonrhein asked if it would be possible to adopt and enforce axis naming conventions in the Gold Standard. After considerable debate on the subject it was decided to first treat this as a data validation issue in which software would try to advise people in unfortunate or inconsistent choices of axis names and make an effort to contribute to the nascent IUCr effort on raw data validation with suggestions and offers of support to be channeled from HDRMX to IUCr CommDat via Brian McMahon.
- Diego Gaemperle asked for a clear statement in the administrative path DECTRIS (or anybody) should follow in raising issues, asking questions, and making contributions to the Gold Standard. The answer provided was to use the Github issue system for the nexusformat/definitions repository. Most of the Gold Standard is incorporated into the nexusformat/definitions NXmx application definition. Some important details are in the various base classes, such as NXtransformations and NXdetector. For larger issues the interested parties should work against a fork of the main repository and, when ready, prepare a pull request referring to their suggestions in code in that fork.
- Jie Nan asked in her name and Filip Leonarski's name if it would be possible to extend the Gold Standard to incorporate tags to support un-indexed diffraction spot data, i.e. lists of full or partial reflections given against centroid pixel coordinates instead of just against [h, k, l]. Jie Nan, Filip Leonarski and Herbert Bernstein will form a small working group, which all interested parties are welcome to join, to make a formal proposal for the necessary tags in both CBF and NeXus for this useful idea. Aaron Brewster has joined the group. The task is basically to clarify the definitions already in NeXus NXreflections so that they are unambiguously tied to the image axis definitions and to tie them unambiguously to equivalent CIF _diffrn_refln... tags. This is a small but necessary cleanup effort. It needs to be done carefully to avoid the axis exchanges and flips for beam centers that have at times delayed the processing of diffraction images. A first strawman proposal hopefully will be done by 22 August and a formal proposal by 1 October. The proposal will be submitted for consideration in NeXus via the route discussed in the prior point. Herbert Bernstein will take care of subsequent transmission to COMCIFS for incorporation in the relevant CIF dictionaries.
- The compression issue was raised and Dectris agreed to consider adding the option of using Zstandard as an optional alternative to LZ4 in their software, since the change would have no impact on higher level software and Zstandard is already a well-supported HDF5 plugin. The issue of possibly using a smaller chunk size than full images, perhaps as small as a module, was also raised because it similarly does not impact higher level software and might help with some of the performance issues raised by Aaron Brewster in his talk and in subsequent discussions about the cost of data motion.
- In order to gather more information on needed improvements in compression a working group on the topic will be organized. Filip Leonarski and Herbert J. Bernstein have expressed interest.
- In his review of this report, John Helliwell noted, "Interoperability has to be defined precisely such as interoperability by different crystallography users. i.e. not between crystallography and marine science, say. This type of interoperability is what CODATA understands by the term, in general. "