[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Imgcif-l] Adding references to external files to imgCIF
- To: "Herbert J. Bernstein" <yayahjb@gmail.com>
- Subject: Re: [Imgcif-l] Adding references to external files to imgCIF
- From: James H via imgcif-l <imgcif-l@iucr.org>
- Date: Mon, 12 Apr 2021 16:38:45 +1000
- Cc: James H <jamesrhester@gmail.com>, The Crystallographic Binary File and its imgCIF application to image data<imgcif-l@iucr.org>, "Graeme.Winter@Diamond.ac.uk" <Graeme.Winter@diamond.ac.uk>
- In-Reply-To: <CAM+dB2fbEnrZy46W-8AM+g-Rnp+s11diPztbm6tNOJ6FTBC5vA@mail.gmail.com>
- References: <CAM+dB2dGcbLy3NuMy1g=QvWP3Mhj09F1WksKRXJ1BHeZ9_fXyw@mail.gmail.com><FDBF95B6-0C0A-48A1-92B4-9B567AD5C9E5@diamond.ac.uk><CAM+dB2c9qOZg8D151WwoJYkM_YtR-+kKcFvNaLNk4cM=3vEoQQ@mail.gmail.com><CABcsX27+RTDq9HKsVBKRqn7Xs9_sV=o7xDkd0K_YAuxPRWTLPw@mail.gmail.com><CAM+dB2d=bmnEP9f5d99xG+U-FQ26+mezOhprH=J+5AdzLgSGbQ@mail.gmail.com><CAM+dB2fbEnrZy46W-8AM+g-Rnp+s11diPztbm6tNOJ6FTBC5vA@mail.gmail.com>
Dear All, Over a year later I have now written up definitions in DDL2 for inclusion in imgCIF. The full definitions are at the Github issue ( https://github.com/COMCIFS/imgCIF/issues/7). Please have a look and provide feedback here or there. Note that I have added datanames for specifying that the images are contained within compressed archives. I've checked a few known sources of images (proteindiffraction.org, zenodo, a uni repository) and this scheme seems to cover those bases. If you have time, please have a look at your favourite open archive of raw data to see if this scheme is sufficient for you to specify a particular image in that archive. I've reproduced the examples from the definitions below. Of course, in a perfect world we would just give a DOI but those days are not yet upon us due to landing pages. Happy to be corrected on that. best wishes, James. Examples ======== # The frames are contained in a single HDF5-format file accessible # at https://zenodo.org/record/12345/files/tartaric.h5. An array of 2D # images is found at HDF5 location /entry1/detector1/data loop_ _array_data.array_id _array_data.binary_id _array_data.external_format _array_data.location_uri _array_data.external_path _array_data.external_frame 1 1 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/detector1/data 1 1 2 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/detector1/data 2 ... # Frames are contained in individual Smart6000 Bruker-format files # accessible using https://uni_repo.edu/5341 in subdirectory run1. loop_ _array_data.array_id _array_data.binary_id _array_data.external_format _array_data.external_version _array_data.location_uri 1 1 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.001 1 2 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.002 ... # Frames with SMV format are contained at data.proteindiffraction.org in a tarred # archive compressed with bzip2. loop_ _array_data.array_id _array_data.binary_id _array_data.external_format _array_data.location_uri _array_data.external_archive_format _array_data.external_archive_path 1 1 SMV https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc0001574_7k69.ta r.bz2 TBZ MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0001.img 1 2 SMV https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc0001574_7k69.ta r.bz2 TBZ MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0002.img On Tue, 5 Mar 2019 at 16:37, James Hester <jamesrhester@gmail.com> wrote: > OK, I've drafted up some definitions (just the human-readable part for > now) for you all to peruse. Please look at > https://github.com/COMCIFS/imgCIF/issues/7 and provide feedback here or > there. > > all the the best, > James. > > On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wrote: > >> Thanks for the support Herbert. Does anybody have any concerns or >> improvements to the data names that I sent originally? If not, I guess I >> will write up some formal dictionary definitions for your consideration. >> >> James. >> >> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com> >> wrote: >> >>> Dear Colleagues, >>> >>> Since 2012 NIAC and COMCIFS have worked cooperatively to make >>> imgCIF/CBF and NeXus/HDF5 fully interoperable. This is very >>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to >>> NeXus/HDF5 to carry the same information as imgCIF/CBF AXIS. >>> What James has suggested will allow imgcif/CBF to carry the same dataset >>> structure information as is conveyed in the external links of >>> an Eiger dataset, which divides the collected data into a master file >>> with the metadata and a set of datafiles. This structural division >>> may not be important for some smaller datasets with only a few hundred >>> to a few thousand frames, but can be very important in >>> handling datasets with more frames than that that are encountered in >>> serial crystallography. Even for the smaller datasets this approach can >>> help to solve a problem for archives and facilities that need to store >>> metadata in a relational database while the data itself has been parked in >>> raw file systems, non-relational databases, zenodo, etc. As with almost >>> all of CIF, imgCIF/CBF metadata maps very easily and directly >>> into relational tables, while putting NeXus/HDF5 metadata into a >>> relational database first requires exactly the same sort of transformations >>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CBF >>> To me it seems that James' suggestion is not a reinvention >>> of this particular wheel, but may be an important step in avoiding >>> reinvention of the wheel. This may avoid a lot of unnecessary >>> transformation >>> of huge quantities of raw data in serial crystallography while making >>> the metadata more accessible. >>> >>> I would suggest giving James' suggestion serious consideration. >>> >>> Regards, >>> Herbert >>> while putting >>> >>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com> >>> wrote: >>> >>>> Dear Graeme, >>>> >>>> The context of this is the idea that a single imgCIF file could be >>>> generated from a collection of raw image files (in whatever format, >>>> whether >>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the >>>> metadata >>>> pertaining to that collection. In such a situation, some way of >>>> referring >>>> to the raw frames from within the imgCIF file is required. >>>> >>>> I agree that a perfectly reasonable approach is not to generate any new >>>> file at all, and simply to access the metadata directly in whatever >>>> format >>>> happens to be there. This was my initial impulse as well and it took me >>>> a >>>> while to understand that the actual proposal was to create an imgCIF >>>> file, >>>> rather than just use imgCIF datanames for specification purposes. From >>>> a >>>> semantic point of view both amount to the same thing so my only real >>>> motivation here is to add an image linking facility to imgCIF so that >>>> the >>>> "generate a summary metadata file" approach is possible. >>>> >>>> Could we just copy the HDF5 way of referring to objects in other HDF5 >>>> files >>>> as a quick solution? >>>> >>>> all the best, >>>> James. >>>> >>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk < >>>> Graeme.Winter@diamond.ac.uk> wrote: >>>> >>>> > Dear James, >>>> > >>>> > On the face of it, this looks a lot to me like a reinvention of HDF5 - >>>> > perhaps with specific semantics - and there is already a (complete?) >>>> > mapping from imgCIF to HDF5 / NeXus >>>> > >>>> > Have I missed something? No offence meant, trying to understand the >>>> shape >>>> > of the problem you are trying to solve >>>> > >>>> > Thanks & best wishes Graeme >>>> > >>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com> >>>> wrote: >>>> > > >>>> > > Dear All, >>>> > > >>>> > > Recent Commdat discussion revealed a desire to reference external >>>> images >>>> > > from within an imgCIF file. This would allow the metadata for a >>>> dataset >>>> > to >>>> > > be held within a single imgCIF file, while the frames themselves >>>> remain >>>> > > separate. This avoids the impracticality of navigating through an >>>> > enormous >>>> > > mulit-frame imgCIF file in order to extract a relatively compact >>>> amount >>>> > of >>>> > > information. >>>> > > >>>> > > As a starting proposal, I suggest we extend the _array_data >>>> category with >>>> > > the following three datanames: >>>> > > >>>> > > (1) _array_data.external_format A value drawn from an enumerated >>>> list >>>> > of >>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each >>>> enumerated >>>> > > value would explain how to interpret _array_data.internal_path >>>> > > (2) _array_data.location_url A URI for the file >>>> containing the >>>> > > image. A relative URL is relative to the location of the imgCIF file >>>> > > (3) _array_data.internal_path A format-specific string >>>> describing >>>> > > the location of the frame within the file identified by >>>> > > _array_data.location_uri, interpreted according to the value given >>>> in >>>> > > _array_data.external_format >>>> > > >>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the >>>> location >>>> > > referenced with a DOI, with appropriate definitions of the path >>>> notation: >>>> > > >>>> > > loop_ >>>> > > _array_data.array_id >>>> > > _array_data.binary_id >>>> > > _array_data.external_format >>>> > > _array_data.location_uri >>>> > > _array_data.internal_path >>>> > > 1 1 NXMX doi:x.y.z >>>> directory/run/masterfilename:/entry1/detector/data[0] >>>> > > 1 2 NXMX doi:x.y.z >>>> directory/run/masterfilename:/entry1/detector/data[1] >>>> > > ... >>>> > > >>>> > > Or for a bunch of single-frame files generated by an ADSC detector >>>> in the >>>> > > same directory as the imgCIF file >>>> > > >>>> > > _array_data.array_id >>>> > > _array_data.binary_id >>>> > > _array_data.external_format >>>> > > _array_data.location_uri >>>> > > 1 1 ADSC ./tartaric.001 >>>> > > 1 2 ADSC ./tartaric.002 >>>> > > 1 3 ADSC ./tartaric.003 >>>> > > ... >>>> > > >>>> > > The imgCIF data items describing the structure of the data array >>>> would >>>> > > refer to the data after it has been provided by the format. The >>>> form in >>>> > > which it is provided should be specified in the definition of each >>>> value >>>> > of >>>> > > "_array_data.external_format". So, for example, the various >>>> compression >>>> > > methods in HDF5 would be invisible if the data as returned are >>>> specified >>>> > to >>>> > > be an array of Reals. >>>> > > >>>> > > From the point of view of initial data validation, it would be >>>> sufficient >>>> > > to check that all referenced files are accessible, and that the >>>> provided >>>> > > locations exist. >>>> > > >>>> > > Thoughts? >>>> > > James. >>>> > > >>>> > > -- >>>> > > T +61 (02) 9717 9907 >>>> > > F +61 (02) 9717 3145 >>>> > > M +61 (04) 0249 4148 >>>> > > _______________________________________________ >>>> > > imgcif-l mailing list >>>> > > imgcif-l@iucr.org >>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l >>>> > >>>> > >>>> > -- >>>> > This e-mail and any attachments may contain confidential, copyright >>>> and or >>>> > privileged material, and are for the use of the intended addressee >>>> only. If >>>> > you are not the intended addressee or an authorised recipient of the >>>> > addressee please notify us of receipt by returning the e-mail and do >>>> not >>>> > use, copy, retain, distribute or disclose the information in or >>>> attached to >>>> > the e-mail. >>>> > Any opinions expressed within this e-mail are those of the individual >>>> and >>>> > not necessarily of Diamond Light Source Ltd. >>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any >>>> > attachments are free from viruses and we cannot accept liability for >>>> any >>>> > damage which you may sustain as a result of software viruses which >>>> may be >>>> > transmitted in or with the message. >>>> > Diamond Light Source Limited (company no. 4375679). Registered in >>>> England >>>> > and Wales with its registered office at Diamond House, Harwell >>>> Science and >>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom >>>> > >>>> > >>>> >>>> -- >>>> T +61 (02) 9717 9907 >>>> F +61 (02) 9717 3145 >>>> M +61 (04) 0249 4148 >>>> _______________________________________________ >>>> imgcif-l mailing list >>>> imgcif-l@iucr.org >>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l >>>> >>> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ imgcif-l mailing list imgcif-l@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
Reply to: [list | sender only]
- Prev by Date: Re: [Imgcif-l] Adding references to external files to imgCIF
- Next by Date: [Imgcif-l] Where are _diffrn_detector_element.center[1],[2]?
- Prev by thread: Re: [Imgcif-l] Adding references to external files to imgCIF
- Next by thread: Re: [Imgcif-l] Adding references to external files to imgCIF
- Index(es):