Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [dddwg] To contribute to Open Science using Zenodo I foundquite straightforward

Dear All:

Our Rovinj DDDWG discussions centered around getting some sort of standard set of metadata for single images, and now a few years later we find ourselves talking about standard metadata for complete experiments! As Graeme says, this is covered comprehensively by imgCIF, which can describe a set of images from quite complex experiments, involving multiple detectors consisting of multiple modules of varying geometry and scan directions, multiple and coupled sample rotation axes, and probably a few other 'multiples' as well. As far as I can tell nxMX is gradually expanding towards the same level of coverage (I see that multiple detector modules have recently(?) been defined in nxMX). The problem with imgCIF has been (and Herbert or Graeme might have more insight into this) that the community/detector manufacturers were looking for something that described a single (binary) image, and imgCIF appeared too complex (thus imgCBF).  Additionally, simply concatenating 720 images into a text file as imgCIF envisages is perhaps not the most efficient file structure for retrieval of the data and I assume that binary formats such as HDF5 would work better.

Now, it is always possible to split a data transfer specification into the metadata component and the file-format-specific component (as scientific facts do not depend on the format in which they are expressed). It follows that the good metadata work of imgCIF can be distilled from the imgCIF standard and appropriate locations created in nxMX (or any other format containing the appropriate data primitives). My forthcoming paper that grew out of the Roving workshop demonstrates how this would be done for imgCIF and an earlier version of nxMX - the software demonstration is on Zenodo via Github, although may be a little opaque without the paper to refer to (http://doi.org/10.5281/zenodo.154459).

Given the above, what the DDDWG could usefully do is to create a plain old list of format-independent metadata terms, with (human readable) definitions meeting certain simple criteria outlined in the above paper. This list could be rapidly created from imgCIF and nxMX standards. The nxMX effort, and indeed imgCIF, could then pick and choose from this list and note in their own specifications which of the canonical terms a given data location corresponds to.  This is a two-way process, as the nxMX effort could feed back new metadata terms as well. My own version of such a list for the purposes of the above software demonstration (so not complete, alas) is at line 239 of https://github.com/jamesrhester/PyFormatTransformer/blob/0.9/FormatTransformer/TransformManager.py.txt - the 'value' entries in the 'key-value' table are the generic names that I am talking about.

The human-readable definitions corresponding to the above list are embedded in a CIF dictionary but it is a matter of a few minutes to write software to bring the names and the human-readable definitions together.  If this group is interested, I can probably whip something up and post it.

James.

On 28 September 2016 at 02:18, <Graeme.Winter@diamond.ac.uk> wrote:
Kamil

If you take a look at the nexus nxmx format you should see that most of this is solved. Indeed it was resolved 20 years ago but no one used imfCIF...

That said the list of software which will natively process this data is not long...

Zip of images does actually work though I agree is far less elegant a solution

Best wishes Graeme

On 27 Sep 2016, at 17:12, Kamil Dziubek <rumianek@amu.edu.pl<mailto:rumianek@amu.edu.pl>> wrote:

Simon -

- thanks for the update. Your example may be just a model solution for the commercial diffractometer companies software managers on how to format the diffraction data for the purpose of data sharing (and it can actually encourage them to add an 'export button' in data collection/reduction programs!) Just we need to be sure that your data could be easily imported by anyone, even without access to the commercial software you were using for data analysis. Not only the image format is important, also axes description.

I am afraid I do not remember if the standard setting of goniometer geometry was unanimously decided. I remember the paper(s) by Loes and John but here the situation is more complicated - multiple runs, description and rotation sense of the axes, the difference between the kappa-axis geometry and Eulerian craddle machines, etc.

Best wishes,
Kamil



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
dddwg mailing list
dddwg@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/dddwg

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.