Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

Sounds good to me. I will update the DDLm equivalents once the dust settles.

On Thu, 7 Apr 2022 at 01:39, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
> 
> Dear Colleagues,
> 
>   I propose the following plan of action to get James' changes into the cif_im
g dictionary
> 
>   0.  In both the yayahjb cbflib active branches:  main and CBFlib-0.9.7-devel
, bring
> the currently posted cif_img_1.8.5 dictionary up to an agreed level (which wil
l be
> called 1.8.6 if there are any changes) and make one last CBFlib 0.9.6 release
with
> that as the default dictionary
>   1.  Merge the current CBFlib_0.9.7-devel branch into main
>   2.  Make that the default release in yayahjb
> 
> If nobody objects, I plan to post the necessary pull requests and releases thi
s weekend.
> 
> 
>  
> On Wed, Apr 6, 2022 at 5:10 AM James H <jamesrhester@gmail.com> wrote:
>>
>> Dear All,
>>
>> Just a quick note: a further year later and the external data pointers
>> work has not yet been merged, and neither has a further proposed data
>> name [1]. On the bright side an implementation using these pointers
>> has been published as a test of practicality [2]. It would of course
>> be most welcome if imgCIF deliberative processes could get themselves
>> to the point that these new data names are merged into the official
>> version of the main dictionary, given that no issues have been
>> identified.
>>
>> Meanwhile, in order to facilitate use of automated DDLm checking tools
>> on data files using imgCIF data names, I have now generated (1) a
>> direct translation of current version 1.8.4 into DDLm (2) a direct
>> translation with added external data pointers to DDLm in a separate
>> "journals-extension" branch. Both of these currently exist as pull
>> requests on the https://github.com/COMCIFS/imgCIF repository, which is
>> intended to hold the DDLm version of the imgCIF dictionary. Anyone is
>> most welcome to comment on these pull requests of course, but I
>> emphasise that they simply use a different dictionary language for
>> defining the same data names, and therefore should have no
>> implications for current imgCIF/CBF usage.
>>
>> best wishes,
>> James.
>>
>> [1] pull request at https://github.com/yayahjb/cbflib/pull/39
>> [2] https://github.com/jamesrhester/ImgCIFHandler.jl
>>
>> On Mon, 12 Apr 2021 at 16:38, James H <jamesrhester@gmail.com> wrote:
>> >
>> > Dear All,
>> >
>> > Over a year later I have now written up definitions in DDL2 for inclusion i
n imgCIF. The full definitions are at the Github issue (https://github.com/COMCI
FS/imgCIF/issues/7). Please have a look and provide feedback here or there. Note
 that I have added datanames for specifying that the images are contained within
 compressed archives. I've checked a few known sources of images (proteindiffrac
tion.org, zenodo, a uni repository) and this scheme seems to cover those bases.
If you have time, please have a look at your favourite open archive of raw data
to see if this scheme is sufficient for you to specify a particular image in tha
t archive.  I've reproduced the examples from the definitions below.
>> >
>> > Of course, in a perfect world we would just give a DOI but those days are n
ot yet upon us due to landing pages. Happy to be corrected on that.
>> >
>> > best wishes,
>> > James.
>> >
>> > Examples
>> > ========
>> > #  The frames are contained in a single HDF5-format file accessible
>> > #   at https://zenodo.org/record/12345/files/tartaric.h5. An array of 2D
>> > #   images is found at HDF5 location /entry1/detector1/data
>> >
>> >      loop_
>> >     _array_data.array_id
>> >     _array_data.binary_id
>> >     _array_data.external_format
>> >     _array_data.location_uri
>> >     _array_data.external_path
>> >     _array_data.external_frame
>> >     1 1 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/dete
ctor1/data 1
>> >     1 2 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/dete
ctor1/data 2
>> >     ...
>> >
>> >  #  Frames are contained in individual Smart6000 Bruker-format files
>> >  #   accessible using https://uni_repo.edu/5341 in subdirectory run1.
>> >
>> >   loop_
>> >     _array_data.array_id
>> >     _array_data.binary_id
>> >     _array_data.external_format
>> >     _array_data.external_version
>> >     _array_data.location_uri
>> >     1 1 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.001
>> >     1 2 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.002
>> >     ...
>> >
>> > #  Frames with SMV format are contained at data.proteindiffraction.org in a
 tarred
>> > #    archive compressed with bzip2.
>> >
>> >     loop_
>> >     _array_data.array_id
>> >     _array_data.binary_id
>> >     _array_data.external_format
>> >     _array_data.location_uri
>> >     _array_data.external_archive_format
>> >     _array_data.external_archive_path
>> >     1 1 SMV
>> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc00
01574_7k69.tar.bz2
>> >         TBZ
>> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0001.img
>> >     1 2 SMV
>> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sddc00
01574_7k69.tar.bz2
>> >         TBZ
>> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0002.img
>> >
>> >
>> > On Tue, 5 Mar 2019 at 16:37, James Hester <jamesrhester@gmail.com> wrote:
>> >>
>> >> OK, I've drafted up some definitions (just the human-readable part for now
) for you all to peruse.  Please look at https://github.com/COMCIFS/imgCIF/issue
s/7 and provide feedback here or there.
>> >>
>> >> all the the best,
>> >> James.
>> >>
>> >> On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wrote:
>> >>>
>> >>> Thanks for the support Herbert. Does anybody have any concerns or improve
ments to the data names that I sent originally? If not, I guess I will write up
some formal dictionary definitions for your consideration.
>> >>>
>> >>> James.
>> >>>
>> >>> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com> wr
ote:
>> >>>>
>> >>>> Dear Colleagues,
>> >>>>
>> >>>>   Since 2012 NIAC and COMCIFS have worked cooperatively to make imgCIF/C
BF and NeXus/HDF5 fully interoperable.  This is very
>> >>>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to Ne
Xus/HDF5 to carry the same information as imgCIF/CBF AXIS.
>> >>>> What James has suggested will allow imgcif/CBF to carry the same dataset
 structure information as is conveyed in the external links of
>> >>>> an Eiger dataset, which divides the collected data into a master file wi
th the metadata and a set of datafiles.  This structural division
>> >>>> may not be important for some smaller datasets with only a few hundred t
o a few thousand frames, but can be very important in
>> >>>> handling datasets with more frames than that that are encountered in ser
ial crystallography.  Even for the smaller datasets this approach can
>> >>>> help to solve a problem for archives and facilities that need to store m
etadata in a relational database while the data itself has been parked in
>> >>>> raw file systems, non-relational databases, zenodo, etc.  As with almost
 all of CIF, imgCIF/CBF metadata maps very easily and directly
>> >>>> into relational tables, while putting NeXus/HDF5 metadata into a relatio
nal database first requires exactly the same sort of transformations
>> >>>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CBF
 To me it seems that James' suggestion is not a reinvention
>> >>>> of this particular wheel, but may be an important step in avoiding reinv
ention of the wheel.  This may avoid a lot of unnecessary transformation
>> >>>> of huge quantities of raw data in serial crystallography while making th
e metadata more accessible.
>> >>>>
>> >>>>   I would suggest giving James' suggestion serious consideration.
>> >>>>
>> >>>>   Regards,
>> >>>>     Herbert
>> >>>> while putting
>> >>>>
>> >>>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com> wr
ote:
>> >>>>>
>> >>>>> Dear Graeme,
>> >>>>>
>> >>>>> The context of this is the idea that a single imgCIF file could be
>> >>>>> generated from a collection of raw image files (in whatever format, whe
ther
>> >>>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the meta
data
>> >>>>> pertaining to that collection. In such a situation, some way of referri
ng
>> >>>>> to the raw frames from within the imgCIF file is required.
>> >>>>>
>> >>>>> I agree that a perfectly reasonable approach is not to generate any new
>> >>>>> file at all, and simply to access the metadata directly in whatever for
mat
>> >>>>> happens to be there. This was my initial impulse as well and it took me
 a
>> >>>>> while to understand that the actual proposal was to create an imgCIF fi
le,
>> >>>>> rather than just use imgCIF datanames for specification purposes.  From
 a
>> >>>>> semantic point of view both amount to the same thing so my only real
>> >>>>> motivation here is to add an image linking facility to imgCIF so that t
he
>> >>>>> "generate a summary metadata file" approach is possible.
>> >>>>>
>> >>>>> Could we just copy the HDF5 way of referring to objects in other HDF5 f
iles
>> >>>>> as a quick solution?
>> >>>>>
>> >>>>> all the best,
>> >>>>> James.
>> >>>>>
>> >>>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
>> >>>>> Graeme.Winter@diamond.ac.uk> wrote:
>> >>>>>
>> >>>>> > Dear James,
>> >>>>> >
>> >>>>> > On the face of it, this looks a lot to me like a reinvention of HDF5
-
>> >>>>> > perhaps with specific semantics - and there is already a (complete?)
>> >>>>> > mapping from imgCIF to HDF5 / NeXus
>> >>>>> >
>> >>>>> > Have I missed something? No offence meant, trying to understand the s
hape
>> >>>>> > of the problem you are trying to solve
>> >>>>> >
>> >>>>> > Thanks & best wishes Graeme
>> >>>>> >
>> >>>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com> wro
te:
>> >>>>> > >
>> >>>>> > > Dear All,
>> >>>>> > >
>> >>>>> > > Recent Commdat discussion revealed a desire to reference external i
mages
>> >>>>> > > from within an imgCIF file. This would allow the metadata for a dat
aset
>> >>>>> > to
>> >>>>> > > be held within a single imgCIF file, while the frames themselves re
main
>> >>>>> > > separate. This avoids the impracticality of navigating through an
>> >>>>> > enormous
>> >>>>> > > mulit-frame imgCIF file in order to extract a relatively compact am
ount
>> >>>>> > of
>> >>>>> > > information.
>> >>>>> > >
>> >>>>> > > As a starting proposal, I suggest we extend the _array_data categor
y with
>> >>>>> > > the following three datanames:
>> >>>>> > >
>> >>>>> > > (1) _array_data.external_format    A value drawn from an enumerated
 list
>> >>>>> > of
>> >>>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each enume
rated
>> >>>>> > > value would explain how to interpret _array_data.internal_path
>> >>>>> > > (2) _array_data.location_url           A URI for the file containin
g the
>> >>>>> > > image. A relative URL is relative to the location of the imgCIF fil
e
>> >>>>> > > (3) _array_data.internal_path        A format-specific string descr
ibing
>> >>>>> > > the location of the frame within the file identified by
>> >>>>> > > _array_data.location_uri, interpreted according to the value given
in
>> >>>>> > > _array_data.external_format
>> >>>>> > >
>> >>>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the loca
tion
>> >>>>> > > referenced with a DOI, with appropriate definitions of the path not
ation:
>> >>>>> > >
>> >>>>> > > loop_
>> >>>>> > > _array_data.array_id
>> >>>>> > > _array_data.binary_id
>> >>>>> > > _array_data.external_format
>> >>>>> > > _array_data.location_uri
>> >>>>> > > _array_data.internal_path
>> >>>>> > > 1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/da
ta[0]
>> >>>>> > > 1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector/da
ta[1]
>> >>>>> > > ...
>> >>>>> > > 
>> >>>>> > > Or for a bunch of single-frame files generated by an ADSC detector
in the
>> >>>>> > > same directory as the imgCIF file
>> >>>>> > > 
>> >>>>> > > _array_data.array_id
>> >>>>> > > _array_data.binary_id
>> >>>>> > > _array_data.external_format
>> >>>>> > > _array_data.location_uri
>> >>>>> > > 1 1 ADSC ./tartaric.001
>> >>>>> > > 1 2 ADSC ./tartaric.002
>> >>>>> > > 1 3 ADSC ./tartaric.003
>> >>>>> > > ...
>> >>>>> > >
>> >>>>> > > The imgCIF data items describing the structure of the data array wo
uld
>> >>>>> > > refer to the data after it has been provided by the format. The for
m in
>> >>>>> > > which it is provided should be specified in the definition of each
value
>> >>>>> > of
>> >>>>> > > "_array_data.external_format".  So, for example, the various compre
ssion
>> >>>>> > > methods in HDF5 would be invisible if the data as returned are spec
ified
>> >>>>> > to
>> >>>>> > > be an array of Reals.
>> >>>>> > >
>> >>>>> > > From the point of view of initial data validation, it would be suff
icient
>> >>>>> > > to check that all referenced files are accessible, and that the pro
vided
>> >>>>> > > locations exist.
>> >>>>> > > 
>> >>>>> > > Thoughts?
>> >>>>> > > James.
>> >>>>> > >
>> >>>>> > > --
>> >>>>> > > T +61 (02) 9717 9907
>> >>>>> > > F +61 (02) 9717 3145
>> >>>>> > > M +61 (04) 0249 4148
>> >>>>> > > _______________________________________________
>> >>>>> > > imgcif-l mailing list
>> >>>>> > > imgcif-l@iucr.org
>> >>>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >>>>> > 
>> >>>>> > 
>> >>>>> > --
>> >>>>> > This e-mail and any attachments may contain confidential, copyright a
nd or
>> >>>>> > privileged material, and are for the use of the intended addressee on
ly. If
>> >>>>> > you are not the intended addressee or an authorised recipient of the
>> >>>>> > addressee please notify us of receipt by returning the e-mail and do
not
>> >>>>> > use, copy, retain, distribute or disclose the information in or attac
hed to
>> >>>>> > the e-mail.
>> >>>>> > Any opinions expressed within this e-mail are those of the individual
 and
>> >>>>> > not necessarily of Diamond Light Source Ltd.
>> >>>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>> >>>>> > attachments are free from viruses and we cannot accept liability for
any
>> >>>>> > damage which you may sustain as a result of software viruses which ma
y be
>> >>>>> > transmitted in or with the message.
>> >>>>> > Diamond Light Source Limited (company no. 4375679). Registered in Eng
land
>> >>>>> > and Wales with its registered office at Diamond House, Harwell Scienc
e and
>> >>>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>> >>>>> >
>> >>>>> >
>> >>>>>
>> >>>>> --
>> >>>>> T +61 (02) 9717 9907
>> >>>>> F +61 (02) 9717 3145
>> >>>>> M +61 (04) 0249 4148
>> >>>>> _______________________________________________
>> >>>>> imgcif-l mailing list
>> >>>>> imgcif-l@iucr.org
>> >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> T +61 (02) 9717 9907
>> >>> F +61 (02) 9717 3145
>> >>> M +61 (04) 0249 4148
>> >>
>> >>
>> >>
>> >> --
>> >> T +61 (02) 9717 9907
>> >> F +61 (02) 9717 3145
>> >> M +61 (04) 0249 4148
>> >
>> >
>> >
>> > --
>> > T +61 (02) 9717 9907
>> > F +61 (02) 9717 3145
>> > M +61 (04) 0249 4148
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]