[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Imgcif-l] Adding references to external files to imgCIF
- To: "Herbert J. Bernstein" <yayahjb@gmail.com>
- Subject: Re: [Imgcif-l] Adding references to external files to imgCIF
- From: James H via imgcif-l <imgcif-l@iucr.org>
- Date: Thu, 7 Apr 2022 14:02:41 +1000
- Cc: James H <jamesrhester@gmail.com>, The Crystallographic Binary File and its imgCIF application to image data<imgcif-l@iucr.org>, "Graeme.Winter@Diamond.ac.uk" <Graeme.Winter@diamond.ac.uk>, Aaron Brewster <asbrewster@lbl.gov>, Billy Poon <BKPoon@lbl.gov>
- In-Reply-To: <CABcsX24kz0-VeXA7P_kYT=135QrO1k0pLwHco8dBo=-djL-8=w@mail.gmail.com>
- References: <CAM+dB2dGcbLy3NuMy1g=QvWP3Mhj09F1WksKRXJ1BHeZ9_fXyw@mail.gmail.com><FDBF95B6-0C0A-48A1-92B4-9B567AD5C9E5@diamond.ac.uk><CAM+dB2c9qOZg8D151WwoJYkM_YtR-+kKcFvNaLNk4cM=3vEoQQ@mail.gmail.com><CABcsX27+RTDq9HKsVBKRqn7Xs9_sV=o7xDkd0K_YAuxPRWTLPw@mail.gmail.com><CAM+dB2d=bmnEP9f5d99xG+U-FQ26+mezOhprH=J+5AdzLgSGbQ@mail.gmail.com><CAM+dB2fbEnrZy46W-8AM+g-Rnp+s11diPztbm6tNOJ6FTBC5vA@mail.gmail.com><CAM+dB2eXcLOP2z_9mitnXACH0+KZf7tcFZ4jOLL0SNub1DKcqA@mail.gmail.com><CAM+dB2cZeMENx5hrRXYFXdJJk+AjtkOB2nqOgGW1zF0=mzYd3g@mail.gmail.com><CABcsX24kz0-VeXA7P_kYT=135QrO1k0pLwHco8dBo=-djL-8=w@mail.gmail.com>
Sounds good to me. I will update the DDLm equivalents once the dust settles. On Thu, 7 Apr 2022 at 01:39, Herbert J. Bernstein <yayahjb@gmail.com> wrote: > > Dear Colleagues, > > I propose the following plan of action to get James' changes into the cif_im g dictionary > > 0. In both the yayahjb cbflib active branches: main and CBFlib-0.9.7-devel , bring > the currently posted cif_img_1.8.5 dictionary up to an agreed level (which wil l be > called 1.8.6 if there are any changes) and make one last CBFlib 0.9.6 release with > that as the default dictionary > 1. Merge the current CBFlib_0.9.7-devel branch into main > 2. Make that the default release in yayahjb > > If nobody objects, I plan to post the necessary pull requests and releases thi s weekend. > > > > On Wed, Apr 6, 2022 at 5:10 AM James H <jamesrhester@gmail.com> wrote: >> >> Dear All, >> >> Just a quick note: a further year later and the external data pointers >> work has not yet been merged, and neither has a further proposed data >> name [1]. On the bright side an implementation using these pointers >> has been published as a test of practicality [2]. It would of course >> be most welcome if imgCIF deliberative processes could get themselves >> to the point that these new data names are merged into the official >> version of the main dictionary, given that no issues have been >> identified. >> >> Meanwhile, in order to facilitate use of automated DDLm checking tools >> on data files using imgCIF data names, I have now generated (1) a >> direct translation of current version 1.8.4 into DDLm (2) a direct >> translation with added external data pointers to DDLm in a separate >> "journals-extension" branch. Both of these currently exist as pull >> requests on the https://github.com/COMCIFS/imgCIF repository, which is >> intended to hold the DDLm version of the imgCIF dictionary. Anyone is >> most welcome to comment on these pull requests of course, but I >> emphasise that they simply use a different dictionary language for >> defining the same data names, and therefore should have no >> implications for current imgCIF/CBF usage. >> >> best wishes, >> James. >> >> [1] pull request at https://github.com/yayahjb/cbflib/pull/39 >> [2] https://github.com/jamesrhester/ImgCIFHandler.jl >> >> On Mon, 12 Apr 2021 at 16:38, James H <jamesrhester@gmail.com> wrote: >> > >> > Dear All, >> > >> > Over a year later I have now written up definitions in DDL2 for inclusion i n imgCIF. The full definitions are at the Github issue (https://github.com/COMCI FS/imgCIF/issues/7). Please have a look and provide feedback here or there. Note that I have added datanames for specifying that the images are contained within compressed archives. I've checked a few known sources of images (proteindiffrac tion.org, zenodo, a uni repository) and this scheme seems to cover those bases. If you have time, please have a look at your favourite open archive of raw data to see if this scheme is sufficient for you to specify a particular image in tha t archive. I've reproduced the examples from the definitions below. >> > >> > Of course, in a perfect world we would just give a DOI but those days are n ot yet upon us due to landing pages. Happy to be corrected on that. >> > >> > best wishes, >> > James. >> > >> > Examples >> > ======== >> > # The frames are contained in a single HDF5-format file accessible >> > # at https://zenodo.org/record/12345/files/tartaric.h5. An array of 2D >> > # images is found at HDF5 location /entry1/detector1/data >> > >> > loop_ >> > _array_data.array_id >> > _array_data.binary_id >> > _array_data.external_format >> > _array_data.location_uri >> > _array_data.external_path >> > _array_data.external_frame >> > 1 1 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/dete ctor1/data 1 >> > 1 2 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/dete ctor1/data 2 >> > ... >> > >> > # Frames are contained in individual Smart6000 Bruker-format files >> > # accessible using https://uni_repo.edu/5341 in subdirectory run1. >> > >> > loop_ >> > _array_data.array_id >> > _array_data.binary_id >> > _array_data.external_format >> > _array_data.external_version >> > _array_data.location_uri >> > 1 1 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.001 >> > 1 2 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.002 >> > ... >> > >> > # Frames with SMV format are contained at data.proteindiffraction.org in a tarred >> > # archive compressed with bzip2. >> > >> > loop_ >> > _array_data.array_id >> > _array_data.binary_id >> > _array_data.external_format >> > _array_data.location_uri >> > _array_data.external_archive_format >> > _array_data.external_archive_path >> > 1 1 SMV >> > https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc00 01574_7k69.tar.bz2 >> > TBZ >> > MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0001.img >> > 1 2 SMV >> > https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc00 01574_7k69.tar.bz2 >> > TBZ >> > MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0002.img >> > >> > >> > On Tue, 5 Mar 2019 at 16:37, James Hester <jamesrhester@gmail.com> wrote: >> >> >> >> OK, I've drafted up some definitions (just the human-readable part for now ) for you all to peruse. Please look at https://github.com/COMCIFS/imgCIF/issue s/7 and provide feedback here or there. >> >> >> >> all the the best, >> >> James. >> >> >> >> On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wrote: >> >>> >> >>> Thanks for the support Herbert. Does anybody have any concerns or improve ments to the data names that I sent originally? If not, I guess I will write up some formal dictionary definitions for your consideration. >> >>> >> >>> James. >> >>> >> >>> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com> wr ote: >> >>>> >> >>>> Dear Colleagues, >> >>>> >> >>>> Since 2012 NIAC and COMCIFS have worked cooperatively to make imgCIF/C BF and NeXus/HDF5 fully interoperable. This is very >> >>>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to Ne Xus/HDF5 to carry the same information as imgCIF/CBF AXIS. >> >>>> What James has suggested will allow imgcif/CBF to carry the same dataset structure information as is conveyed in the external links of >> >>>> an Eiger dataset, which divides the collected data into a master file wi th the metadata and a set of datafiles. This structural division >> >>>> may not be important for some smaller datasets with only a few hundred t o a few thousand frames, but can be very important in >> >>>> handling datasets with more frames than that that are encountered in ser ial crystallography. Even for the smaller datasets this approach can >> >>>> help to solve a problem for archives and facilities that need to store m etadata in a relational database while the data itself has been parked in >> >>>> raw file systems, non-relational databases, zenodo, etc. As with almost all of CIF, imgCIF/CBF metadata maps very easily and directly >> >>>> into relational tables, while putting NeXus/HDF5 metadata into a relatio nal database first requires exactly the same sort of transformations >> >>>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CBF To me it seems that James' suggestion is not a reinvention >> >>>> of this particular wheel, but may be an important step in avoiding reinv ention of the wheel. This may avoid a lot of unnecessary transformation >> >>>> of huge quantities of raw data in serial crystallography while making th e metadata more accessible. >> >>>> >> >>>> I would suggest giving James' suggestion serious consideration. >> >>>> >> >>>> Regards, >> >>>> Herbert >> >>>> while putting >> >>>> >> >>>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com> wr ote: >> >>>>> >> >>>>> Dear Graeme, >> >>>>> >> >>>>> The context of this is the idea that a single imgCIF file could be >> >>>>> generated from a collection of raw image files (in whatever format, whe ther >> >>>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the meta data >> >>>>> pertaining to that collection. In such a situation, some way of referri ng >> >>>>> to the raw frames from within the imgCIF file is required. >> >>>>> >> >>>>> I agree that a perfectly reasonable approach is not to generate any new >> >>>>> file at all, and simply to access the metadata directly in whatever for mat >> >>>>> happens to be there. This was my initial impulse as well and it took me a >> >>>>> while to understand that the actual proposal was to create an imgCIF fi le, >> >>>>> rather than just use imgCIF datanames for specification purposes. From a >> >>>>> semantic point of view both amount to the same thing so my only real >> >>>>> motivation here is to add an image linking facility to imgCIF so that t he >> >>>>> "generate a summary metadata file" approach is possible. >> >>>>> >> >>>>> Could we just copy the HDF5 way of referring to objects in other HDF5 f iles >> >>>>> as a quick solution? >> >>>>> >> >>>>> all the best, >> >>>>> James. >> >>>>> >> >>>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk < >> >>>>> Graeme.Winter@diamond.ac.uk> wrote: >> >>>>> >> >>>>> > Dear James, >> >>>>> > >> >>>>> > On the face of it, this looks a lot to me like a reinvention of HDF5 - >> >>>>> > perhaps with specific semantics - and there is already a (complete?) >> >>>>> > mapping from imgCIF to HDF5 / NeXus >> >>>>> > >> >>>>> > Have I missed something? No offence meant, trying to understand the s hape >> >>>>> > of the problem you are trying to solve >> >>>>> > >> >>>>> > Thanks & best wishes Graeme >> >>>>> > >> >>>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com> wro te: >> >>>>> > > >> >>>>> > > Dear All, >> >>>>> > > >> >>>>> > > Recent Commdat discussion revealed a desire to reference external i mages >> >>>>> > > from within an imgCIF file. This would allow the metadata for a dat aset >> >>>>> > to >> >>>>> > > be held within a single imgCIF file, while the frames themselves re main >> >>>>> > > separate. This avoids the impracticality of navigating through an >> >>>>> > enormous >> >>>>> > > mulit-frame imgCIF file in order to extract a relatively compact am ount >> >>>>> > of >> >>>>> > > information. >> >>>>> > > >> >>>>> > > As a starting proposal, I suggest we extend the _array_data categor y with >> >>>>> > > the following three datanames: >> >>>>> > > >> >>>>> > > (1) _array_data.external_format A value drawn from an enumerated list >> >>>>> > of >> >>>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each enume rated >> >>>>> > > value would explain how to interpret _array_data.internal_path >> >>>>> > > (2) _array_data.location_url A URI for the file containin g the >> >>>>> > > image. A relative URL is relative to the location of the imgCIF fil e >> >>>>> > > (3) _array_data.internal_path A format-specific string descr ibing >> >>>>> > > the location of the frame within the file identified by >> >>>>> > > _array_data.location_uri, interpreted according to the value given in >> >>>>> > > _array_data.external_format >> >>>>> > > >> >>>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the loca tion >> >>>>> > > referenced with a DOI, with appropriate definitions of the path not ation: >> >>>>> > > >> >>>>> > > loop_ >> >>>>> > > _array_data.array_id >> >>>>> > > _array_data.binary_id >> >>>>> > > _array_data.external_format >> >>>>> > > _array_data.location_uri >> >>>>> > > _array_data.internal_path >> >>>>> > > 1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/da ta[0] >> >>>>> > > 1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/da ta[1] >> >>>>> > > ... >> >>>>> > > >> >>>>> > > Or for a bunch of single-frame files generated by an ADSC detector in the >> >>>>> > > same directory as the imgCIF file >> >>>>> > > >> >>>>> > > _array_data.array_id >> >>>>> > > _array_data.binary_id >> >>>>> > > _array_data.external_format >> >>>>> > > _array_data.location_uri >> >>>>> > > 1 1 ADSC ./tartaric.001 >> >>>>> > > 1 2 ADSC ./tartaric.002 >> >>>>> > > 1 3 ADSC ./tartaric.003 >> >>>>> > > ... >> >>>>> > > >> >>>>> > > The imgCIF data items describing the structure of the data array wo uld >> >>>>> > > refer to the data after it has been provided by the format. The for m in >> >>>>> > > which it is provided should be specified in the definition of each value >> >>>>> > of >> >>>>> > > "_array_data.external_format". So, for example, the various compre ssion >> >>>>> > > methods in HDF5 would be invisible if the data as returned are spec ified >> >>>>> > to >> >>>>> > > be an array of Reals. >> >>>>> > > >> >>>>> > > From the point of view of initial data validation, it would be suff icient >> >>>>> > > to check that all referenced files are accessible, and that the pro vided >> >>>>> > > locations exist. >> >>>>> > > >> >>>>> > > Thoughts? >> >>>>> > > James. >> >>>>> > > >> >>>>> > > -- >> >>>>> > > T +61 (02) 9717 9907 >> >>>>> > > F +61 (02) 9717 3145 >> >>>>> > > M +61 (04) 0249 4148 >> >>>>> > > _______________________________________________ >> >>>>> > > imgcif-l mailing list >> >>>>> > > imgcif-l@iucr.org >> >>>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l >> >>>>> > >> >>>>> > >> >>>>> > -- >> >>>>> > This e-mail and any attachments may contain confidential, copyright a nd or >> >>>>> > privileged material, and are for the use of the intended addressee on ly. If >> >>>>> > you are not the intended addressee or an authorised recipient of the >> >>>>> > addressee please notify us of receipt by returning the e-mail and do not >> >>>>> > use, copy, retain, distribute or disclose the information in or attac hed to >> >>>>> > the e-mail. >> >>>>> > Any opinions expressed within this e-mail are those of the individual and >> >>>>> > not necessarily of Diamond Light Source Ltd. >> >>>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any >> >>>>> > attachments are free from viruses and we cannot accept liability for any >> >>>>> > damage which you may sustain as a result of software viruses which ma y be >> >>>>> > transmitted in or with the message. >> >>>>> > Diamond Light Source Limited (company no. 4375679). Registered in Eng land >> >>>>> > and Wales with its registered office at Diamond House, Harwell Scienc e and >> >>>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom >> >>>>> > >> >>>>> > >> >>>>> >> >>>>> -- >> >>>>> T +61 (02) 9717 9907 >> >>>>> F +61 (02) 9717 3145 >> >>>>> M +61 (04) 0249 4148 >> >>>>> _______________________________________________ >> >>>>> imgcif-l mailing list >> >>>>> imgcif-l@iucr.org >> >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l >> >>> >> >>> >> >>> >> >>> -- >> >>> T +61 (02) 9717 9907 >> >>> F +61 (02) 9717 3145 >> >>> M +61 (04) 0249 4148 >> >> >> >> >> >> >> >> -- >> >> T +61 (02) 9717 9907 >> >> F +61 (02) 9717 3145 >> >> M +61 (04) 0249 4148 >> > >> > >> > >> > -- >> > T +61 (02) 9717 9907 >> > F +61 (02) 9717 3145 >> > M +61 (04) 0249 4148 >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ imgcif-l mailing list imgcif-l@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
Reply to: [list | sender only]
- References:
- [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (Graeme.Winter@Diamond.ac.uk)
- Re: [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (Herbert J. Bernstein)
- Re: [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Prev by Date: Re: [Imgcif-l] Adding references to external files to imgCIF
- Prev by thread: Re: [Imgcif-l] Adding references to external files to imgCIF
- Index(es):