[Imgcif-l] Adding references to external files to imgCIF

  • To: The Crystallographic Binary File and its imgCIF application to image data<imgcif-l@iucr.org>
  • Subject: [Imgcif-l] Adding references to external files to imgCIF
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 13 Feb 2019 16:15:45 +1100
Dear All,
Recent Commdat discussion revealed a desire to reference external imagesfrom within an imgCIF file. This would allow the metadata for a dataset tobe held within a single imgCIF file, while the frames themselves remainseparate. This avoids the impracticality of navigating through an enormousmulit-frame imgCIF file in order to extract a relatively compact amount ofinformation.
As a starting proposal, I suggest we extend the _array_data category withthe following three datanames:
(1) _array_data.external_format    A value drawn from an enumerated list offormats (e.g. "SMV","HDF5","Bruker"). The definition for each enumeratedvalue would explain how to interpret _array_data.internal_path(2) _array_data.location_url           A URI for the file containing theimage. A relative URL is relative to the location of the imgCIF file(3) _array_data.internal_path        A format-specific string describingthe location of the frame within the file identified by_array_data.location_uri, interpreted according to the value given in_array_data.external_format
So for a multi-frame HDF5 file buried in a subdirectory of the locationreferenced with a DOI, with appropriate definitions of the path notation:
loop__array_data.array_id_array_data.binary_id_array_data.external_format_array_data.location_uri_array_data.internal_path1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/data[0]1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/data[1]...
Or for a bunch of single-frame files generated by an ADSC detector in thesame directory as the imgCIF file
_array_data.array_id_array_data.binary_id_array_data.external_format_array_data.location_uri1 1 ADSC ./tartaric.0011 2 ADSC ./tartaric.0021 3 ADSC ./tartaric.003...
The imgCIF data items describing the structure of the data array wouldrefer to the data after it has been provided by the format. The form inwhich it is provided should be specified in the definition of each value of"_array_data.external_format".  So, for example, the various compressionmethods in HDF5 would be invisible if the data as returned are specified tobe an array of Reals.
From the point of view of initial data validation, it would be sufficientto check that all referenced files are accessible, and that the providedlocations exist.
-- T +61 (02) 9717 9907F +61 (02) 9717 3145M +61 (04) 0249 4148

