Re: [Imgcif-l] Adding references to external files to imgCIF

OK, I've drafted up some definitions (just the human-readable part for now)for you all to peruse.  Please look athttps://github.com/COMCIFS/imgCIF/issues/7 and provide feedback here orthere.
all the best,
James.
On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wrote:
> Thanks for the support Herbert. Does anybody have any concerns or> improvements to the data names that I sent originally? If not, I guess I> will write up some formal dictionary definitions for your consideration.>> James.>> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com>> wrote:>>> Dear Colleagues,>>>>   Since 2012 NIAC and COMCIFS have worked cooperatively to make>> imgCIF/CBF and NeXus/HDF5 fully interoperable.  This is very>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to>> NeXus/HDF5 to carry the same information as imgCIF/CBF AXIS.>> What James has suggested will allow imgcif/CBF to carry the same dataset>> structure information as is conveyed in the external links of>> an Eiger dataset, which divides the collected data into a master file>> with the metadata and a set of datafiles.  This structural division>> may not be important for some smaller datasets with only a few hundred to>> a few thousand frames, but can be very important in>> handling datasets with more frames than that that are encountered in>> serial crystallography.  Even for the smaller datasets this approach can>> help to solve a problem for archives and facilities that need to store>> metadata in a relational database while the data itself has been parked in>> raw file systems, non-relational databases, zenodo, etc.  As with almost>> all of CIF, imgCIF/CBF metadata maps very easily and directly>> into relational tables, while putting NeXus/HDF5 metadata into a>> relational database first requires exactly the same sort of transformations>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CBF>> To me it seems that James' suggestion is not a reinvention>> of this particular wheel, but may be an important step in avoiding>> reinvention of the wheel.  This may avoid a lot of unnecessary>> transformation>> of huge quantities of raw data in serial crystallography while making the>> metadata more accessible.>>>>   I would suggest giving James' suggestion serious consideration.>>>>   Regards,>>     Herbert>> while putting>>>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com>>> wrote:>>>>> Dear Graeme,>>>>>> The context of this is the idea that a single imgCIF file could be>>> generated from a collection of raw image files (in whatever format,>>> whether>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the>>> metadata>>> pertaining to that collection. In such a situation, some way of referring>>> to the raw frames from within the imgCIF file is required.>>>>>> I agree that a perfectly reasonable approach is not to generate any new>>> file at all, and simply to access the metadata directly in whatever>>> format>>> happens to be there. This was my initial impulse as well and it took me a>>> while to understand that the actual proposal was to create an imgCIF>>> file,>>> rather than just use imgCIF datanames for specification purposes.  From a>>> semantic point of view both amount to the same thing so my only real>>> motivation here is to add an image linking facility to imgCIF so that the>>> "generate a summary metadata file" approach is possible.>>>>>> Could we just copy the HDF5 way of referring to objects in other HDF5>>> files>>> as a quick solution?>>>>>> all the best,>>> James.>>>>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <>>> Graeme.Winter@diamond.ac.uk> wrote:>>>>>> > Dear James,>>> >>>> > On the face of it, this looks a lot to me like a reinvention of HDF5 ->>> > perhaps with specific semantics - and there is already a (complete?)>>> > mapping from imgCIF to HDF5 / NeXus>>> >>>> > Have I missed something? No offence meant, trying to understand the>>> shape>>> > of the problem you are trying to solve>>> >>>> > Thanks & best wishes Graeme>>> >>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com>>>> wrote:>>> > >>>> > > Dear All,>>> > >>>> > > Recent Commdat discussion revealed a desire to reference external>>> images>>> > > from within an imgCIF file. This would allow the metadata for a>>> dataset>>> > to>>> > > be held within a single imgCIF file, while the frames themselves>>> remain>>> > > separate. This avoids the impracticality of navigating through an>>> > enormous>>> > > mulit-frame imgCIF file in order to extract a relatively compact>>> amount>>> > of>>> > > information.>>> > >>>> > > As a starting proposal, I suggest we extend the _array_data category>>> with>>> > > the following three datanames:>>> > >>>> > > (1) _array_data.external_format    A value drawn from an enumerated>>> list>>> > of>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each>>> enumerated>>> > > value would explain how to interpret _array_data.internal_path>>> > > (2) _array_data.location_url           A URI for the file containing>>> the>>> > > image. A relative URL is relative to the location of the imgCIF file>>> > > (3) _array_data.internal_path        A format-specific string>>> describing>>> > > the location of the frame within the file identified by>>> > > _array_data.location_uri, interpreted according to the value given in>>> > > _array_data.external_format>>> > >>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the>>> location>>> > > referenced with a DOI, with appropriate definitions of the path>>> notation:>>> > >>>> > > loop_>>> > > _array_data.array_id>>> > > _array_data.binary_id>>> > > _array_data.external_format>>> > > _array_data.location_uri>>> > > _array_data.internal_path>>> > > 1 1 NXMX doi:x.y.z>>> directory/run/masterfilename:/entry1/detector/data[0]>>> > > 1 2 NXMX doi:x.y.z>>> directory/run/masterfilename:/entry1/detector/data[1]>>> > > ...>>> > >>>> > > Or for a bunch of single-frame files generated by an ADSC detector>>> in the>>> > > same directory as the imgCIF file>>> > >>>> > > _array_data.array_id>>> > > _array_data.binary_id>>> > > _array_data.external_format>>> > > _array_data.location_uri>>> > > 1 1 ADSC ./tartaric.001>>> > > 1 2 ADSC ./tartaric.002>>> > > 1 3 ADSC ./tartaric.003>>> > > ...>>> > >>>> > > The imgCIF data items describing the structure of the data array>>> would>>> > > refer to the data after it has been provided by the format. The form>>> in>>> > > which it is provided should be specified in the definition of each>>> value>>> > of>>> > > "_array_data.external_format".  So, for example, the various compression
> > methods in HDF5 would be invisible if the data as returned are specified
> to
> > be an array of Reals.
> >
> > From the point of view of initial data validation, it would be sufficient
> > to check that all referenced files are accessible, and that the provided
> > locations exist.
> >
> > Thoughts?
> > James. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>

