Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Imgcif-l] Adding references to external files to imgCIF

  • To: The Crystallographic Binary File and its imgCIF application to image data<imgcif-l@iucr.org>
  • Subject: [Imgcif-l] Adding references to external files to imgCIF
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 13 Feb 2019 16:15:45 +1100
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:reply-to:from:date:message-id:subject:to;bh=bYcMWY1wNzZjNh8AOGqpC/hXOtYkcF3DJX0ZUhWxSFM=;b=fkym7ishsQl0b17uGLm5LASrc6cpVgVq5/S1Rb3VT+h/3xkKsz+rr2OxIQjrjYKumrVTym1Y0SPDugTdexqdPfjmIylAbFjPZtaJjhs/R+63hQhQnkGPXZEWggeAJKLNHXdC7GYy2IjkXsfsIzu+a5ZYuup6JW4g4hnCOIugbN47vDSHZqtL6d9N7eeZW/GRqYIrrALp6FwVLY9bp5xoq7jiHyJ4V+IM0mhwbbBGQmRgeu04ss4DP1XIOf7GPALuVrXGTjV8g1WsVys4niAOWSsolPjKJL6ATn1ssEXatzzjPqEb6UWzW7dgXKKWj4UzK41jGqz9iiUVcTKXrVPQkw==
Dear All,

Recent Commdat discussion revealed a desire to reference external images
from within an imgCIF file. This would allow the metadata for a dataset to
be held within a single imgCIF file, while the frames themselves remain
separate. This avoids the impracticality of navigating through an enormous
mulit-frame imgCIF file in order to extract a relatively compact amount of
information.

As a starting proposal, I suggest we extend the _array_data category with
the following three datanames:

(1) _array_data.external_format    A value drawn from an enumerated list of
formats (e.g. "SMV","HDF5","Bruker"). The definition for each enumerated
value would explain how to interpret _array_data.internal_path
(2) _array_data.location_url           A URI for the file containing the
image. A relative URL is relative to the location of the imgCIF file
(3) _array_data.internal_path        A format-specific string describing
the location of the frame within the file identified by
_array_data.location_uri, interpreted according to the value given in
_array_data.external_format

So for a multi-frame HDF5 file buried in a subdirectory of the location
referenced with a DOI, with appropriate definitions of the path notation:

loop_
_array_data.array_id
_array_data.binary_id
_array_data.external_format
_array_data.location_uri
_array_data.internal_path
1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/data[0]
1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/data[1]
...

Or for a bunch of single-frame files generated by an ADSC detector in the
same directory as the imgCIF file

_array_data.array_id
_array_data.binary_id
_array_data.external_format
_array_data.location_uri
1 1 ADSC ./tartaric.001
1 2 ADSC ./tartaric.002
1 3 ADSC ./tartaric.003
...

The imgCIF data items describing the structure of the data array would
refer to the data after it has been provided by the format. The form in
which it is provided should be specified in the definition of each value of
"_array_data.external_format".  So, for example, the various compression
methods in HDF5 would be invisible if the data as returned are specified to
be an array of Reals.

From the point of view of initial data validation, it would be sufficient
to check that all referenced files are accessible, and that the provided
locations exist.

Thoughts?
James.

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]