Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

OK, I've drafted up some definitions (just the human-readable part for now)
for you all to peruse.  Please look at
https://github.com/COMCIFS/imgCIF/issues/7 and provide feedback here or
there.

all the the best,
James.

On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wrote:

> Thanks for the support Herbert. Does anybody have any concerns or
> improvements to the data names that I sent originally? If not, I guess I
> will write up some formal dictionary definitions for your consideration.
> 
> James.
> 
> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com>
> wrote:
> 
>> Dear Colleagues,
>>
>>   Since 2012 NIAC and COMCIFS have worked cooperatively to make
>> imgCIF/CBF and NeXus/HDF5 fully interoperable.  This is very
>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to
>> NeXus/HDF5 to carry the same information as imgCIF/CBF AXIS.
>> What James has suggested will allow imgcif/CBF to carry the same dataset
>> structure information as is conveyed in the external links of
>> an Eiger dataset, which divides the collected data into a master file
>> with the metadata and a set of datafiles.  This structural division
>> may not be important for some smaller datasets with only a few hundred to
>> a few thousand frames, but can be very important in
>> handling datasets with more frames than that that are encountered in
>> serial crystallography.  Even for the smaller datasets this approach can
>> help to solve a problem for archives and facilities that need to store
>> metadata in a relational database while the data itself has been parked in
>> raw file systems, non-relational databases, zenodo, etc.  As with almost
>> all of CIF, imgCIF/CBF metadata maps very easily and directly
>> into relational tables, while putting NeXus/HDF5 metadata into a
>> relational database first requires exactly the same sort of transformations
>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CBF
>> To me it seems that James' suggestion is not a reinvention
>> of this particular wheel, but may be an important step in avoiding
>> reinvention of the wheel.  This may avoid a lot of unnecessary
>> transformation
>> of huge quantities of raw data in serial crystallography while making the
>> metadata more accessible.
>>
>>   I would suggest giving James' suggestion serious consideration.
>>
>>   Regards,
>>     Herbert
>> while putting
>>
>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com>
>> wrote:
>>
>>> Dear Graeme,
>>>
>>> The context of this is the idea that a single imgCIF file could be
>>> generated from a collection of raw image files (in whatever format,
>>> whether
>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the
>>> metadata
>>> pertaining to that collection. In such a situation, some way of referring
>>> to the raw frames from within the imgCIF file is required.
>>>
>>> I agree that a perfectly reasonable approach is not to generate any new
>>> file at all, and simply to access the metadata directly in whatever
>>> format
>>> happens to be there. This was my initial impulse as well and it took me a
>>> while to understand that the actual proposal was to create an imgCIF
>>> file,
>>> rather than just use imgCIF datanames for specification purposes.  From a
>>> semantic point of view both amount to the same thing so my only real
>>> motivation here is to add an image linking facility to imgCIF so that the
>>> "generate a summary metadata file" approach is possible.
>>>
>>> Could we just copy the HDF5 way of referring to objects in other HDF5
>>> files
>>> as a quick solution?
>>>
>>> all the best,
>>> James.
>>>
>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
>>> Graeme.Winter@diamond.ac.uk> wrote:
>>>
>>> > Dear James,
>>> >
>>> > On the face of it, this looks a lot to me like a reinvention of HDF5 -
>>> > perhaps with specific semantics - and there is already a (complete?)
>>> > mapping from imgCIF to HDF5 / NeXus
>>> >
>>> > Have I missed something? No offence meant, trying to understand the
>>> shape
>>> > of the problem you are trying to solve
>>> >
>>> > Thanks & best wishes Graeme
>>> >
>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com>
>>> wrote:
>>> > >
>>> > > Dear All,
>>> > >
>>> > > Recent Commdat discussion revealed a desire to reference external
>>> images
>>> > > from within an imgCIF file. This would allow the metadata for a
>>> dataset
>>> > to
>>> > > be held within a single imgCIF file, while the frames themselves
>>> remain
>>> > > separate. This avoids the impracticality of navigating through an
>>> > enormous
>>> > > mulit-frame imgCIF file in order to extract a relatively compact
>>> amount
>>> > of
>>> > > information.
>>> > >
>>> > > As a starting proposal, I suggest we extend the _array_data category
>>> with
>>> > > the following three datanames:
>>> > >
>>> > > (1) _array_data.external_format    A value drawn from an enumerated
>>> list
>>> > of
>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each
>>> enumerated
>>> > > value would explain how to interpret _array_data.internal_path
>>> > > (2) _array_data.location_url           A URI for the file containing
>>> the
>>> > > image. A relative URL is relative to the location of the imgCIF file
>>> > > (3) _array_data.internal_path        A format-specific string
>>> describing
>>> > > the location of the frame within the file identified by
>>> > > _array_data.location_uri, interpreted according to the value given in
>>> > > _array_data.external_format
>>> > >
>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the
>>> location
>>> > > referenced with a DOI, with appropriate definitions of the path
>>> notation:
>>> > >
>>> > > loop_
>>> > > _array_data.array_id
>>> > > _array_data.binary_id
>>> > > _array_data.external_format
>>> > > _array_data.location_uri
>>> > > _array_data.internal_path
>>> > > 1 1 NXMX doi:x.y.z
>>> directory/run/masterfilename:/entry1/detector/data[0]
>>> > > 1 2 NXMX doi:x.y.z
>>> directory/run/masterfilename:/entry1/detector/data[1]

>>> > > ...
>>> > >
>>> > > Or for a bunch of single-frame files generated by an ADSC detector
>>> in the
>>> > > same directory as the imgCIF file
>>> > >
>>> > > _array_data.array_id
>>> > > _array_data.binary_id
>>> > > _array_data.external_format
>>> > > _array_data.location_uri
>>> > > 1 1 ADSC ./tartaric.001
>>> > > 1 2 ADSC ./tartaric.002
>>> > > 1 3 ADSC ./tartaric.003
>>> > > ...
>>> > >
>>> > > The imgCIF data items describing the structure of the data array
>>> would
>>> > > refer to the data after it has been provided by the format. The form
>>> in
>>> > > which it is provided should be specified in the definition of each
>>> value
>>> > of
>>> > > "_array_data.external_format".  So, for example, the various
>>> compression
>>> > > methods in HDF5 would be invisible if the data as returned are
>>> specified
>>> > to
>>> > > be an array of Reals.
>>> > >
>>> > > From the point of view of initial data validation, it would be
>>> sufficient
>>> > > to check that all referenced files are accessible, and that the
>>> provided
>>> > > locations exist.
>>> > >
>>> > > Thoughts?
>>> > > James.
>>> > >
>>> > > --
>>> > > T +61 (02) 9717 9907
>>> > > F +61 (02) 9717 3145
>>> > > M +61 (04) 0249 4148
>>> > > _______________________________________________
>>> > > imgcif-l mailing list
>>> > > imgcif-l@iucr.org
>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>>> >
>>> >
>>> > --
>>> > This e-mail and any attachments may contain confidential, copyright
>>> and or
>>> > privileged material, and are for the use of the intended addressee
>>> only. If
>>> > you are not the intended addressee or an authorised recipient of the
>>> > addressee please notify us of receipt by returning the e-mail and do
>>> not
>>> > use, copy, retain, distribute or disclose the information in or
>>> attached to
>>> > the e-mail.
>>> > Any opinions expressed within this e-mail are those of the individual
>>> and
>>> > not necessarily of Diamond Light Source Ltd.
>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>> > attachments are free from viruses and we cannot accept liability for
>>> any
>>> > damage which you may sustain as a result of software viruses which may
>>> be
>>> > transmitted in or with the message.
>>> > Diamond Light Source Limited (company no. 4375679). Registered in
>>> England
>>> > and Wales with its registered office at Diamond House, Harwell Science
>>> and
>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>>> >
>>> >
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>> _______________________________________________
>>> imgcif-l mailing list
>>> imgcif-l@iucr.org
>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>>>
>>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]