Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

Here you go: https://github.com/yayahjb/cbflib/pull/47

There will be a further separate request after I've updated
https://github.com/yayahjb/cbflib/pull/39 unless you would prefer that
I fold it into #47.

thanks,
James.

On Mon, 9 May 2022 at 19:25, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
>
> Dear James,
>   Yes, a specific pull request would be very helpful, since I am not sure I re
ally understand what is needed here.  The .html is the primary change needed, bu
t if you could provide the .dic as you did before, I can work from that.
>   Regards,
>     Herbert
>
> On Sun, May 8, 2022, 8:43 PM James H <jamesrhester@gmail.com> wrote:
>>
>> Thanks Herbert for the clarification. Regarding array sections I think
>> we might be talking about different things but I'll park that for the
>> moment as it is not urgent.
>>
>> What is urgent is that two of the new external data tags have been
>> left out of the update. Please see the issue at
>> https://github.com/yayahjb/cbflib/issues/46 which I'm drawing to your
>> attention here as I'm not sure if anybody looks at issues posted on
>> Github. I'd be happy to create a pull request if that makes life
>> easier.
>>
>> all the best,
>> James.
>>
>> On Sat, 7 May 2022 at 00:18, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
>> >
>> > Dear James,
>> >   The normalization was discussed ages ago and is consistent with DDL2 conv
entions, which
>> > are more normalized than core cif.  The array sections were introduced when
 we had to
>> > start dealing with the Eiger.  It is routine in an eiger 16M data collectio
n to revert to a 4M
>> > ROI (built into the hardware) when more speed is required.  Such descriptio
ns have to be
>> > somewhere.  As speeds increase further, we will soon need to make more use
of
>> > module-by-module ROIs, and we definitely will have to pull them in both ind
ividually
>> > and in groups instead of trying to only move full images.  What approach do
 you suggest
>> > for such cases?
>> >   Regards,
>> >     Herbert
>> >
>> > On Thu, May 5, 2022 at 11:15 PM James H <jamesrhester@gmail.com> wrote:
>> >>
>> >> Thanks Herbert for making these updates. My apologies for taking so
>> >> long to come back to them.
>> >>
>> >> I notice that the new 1.8.5 has moved the _array_data.external_data_*
>> >> tags into a separate array_data_external_data loop. While I appreciate
>> >> that a separate loop is as good a place as any, I would also have
>> >> appreciated some discussion of this - perhaps I missed it? Anyway, I
>> >> do not propose to dispute it now.
>> >>
>> >> More importantly, _array_data_external_data.frame seems to have
>> >> acquired a format ARRAYID(start1:end1:stride1,start2:end2:stride2,
>> >> ...) which I don't recall discussing, and there are now references in
>> >> the definition to ARRAY_STRUCTURE_LIST which I believe miss the point
>> >> that the ARRAY_STRUCTURE_LIST items are used to characterise the array
>> >> after it has been obtained from the external data source, and are
>> >> definitely *not* supposed to describe the layout of the data within
>> >> the external data source. Likewise, ARRAY_ID refers to the layout of
>> >> the data after they have been delivered, and so have no direct
>> >> relevance to how the data are stored. I appreciate that C and Fortran
>> >> layout should be considered by the author of the imgCIF file when
>> >> describing what will be returned from the external source, but I'm not
>> >> sure that this warning is particularly necessary here as the author
>> >> will in any case be forced to consider the details of the
>> >> format-specific behaviour when constructing the external data pointer.
>> >>
>> >> thanks,
>> >> James.
>> >>
>> >> On Wed, 6 Apr 2022 at 23:39, Herbert J. Bernstein <yayahjb@gmail.com> wrot
e:
>> >> >
>> >> > Dear Colleagues,
>> >> >
>> >> >   I propose the following plan of action to get James' changes into the
cif_img dictionary
>> >> >
>> >> >   0.  In both the yayahjb cbflib active branches:  main and CBFlib-0.9.7
-devel, bring
>> >> > the currently posted cif_img_1.8.5 dictionary up to an agreed level (whi
ch will be
 >> >> > called 1.8.6 if there are any changes) and make one last CBFlib 0.9.6 re
lease with
>> >> > that as the default dictionary
>> >> >   1.  Merge the current CBFlib_0.9.7-devel branch into main
>> >> >   2.  Make that the default release in yayahjb
>> >> >
>> >> > If nobody objects, I plan to post the necessary pull requests and releas
es this weekend.
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Apr 6, 2022 at 5:10 AM James H <jamesrhester@gmail.com> wrote:
>> >> >>
>> >> >> Dear All,
>> >> >>
>> >> >> Just a quick note: a further year later and the external data pointers
>> >> >> work has not yet been merged, and neither has a further proposed data
>> >> >> name [1]. On the bright side an implementation using these pointers
>> >> >> has been published as a test of practicality [2]. It would of course
>> >> >> be most welcome if imgCIF deliberative processes could get themselves
>> >> >> to the point that these new data names are merged into the official
>> >> >> version of the main dictionary, given that no issues have been
>> >> >> identified.
>> >> >>
>> >> >> Meanwhile, in order to facilitate use of automated DDLm checking tools
>> >> >> on data files using imgCIF data names, I have now generated (1) a
>> >> >> direct translation of current version 1.8.4 into DDLm (2) a direct
>> >> >> translation with added external data pointers to DDLm in a separate
>> >> >> "journals-extension" branch. Both of these currently exist as pull
>> >> >> requests on the https://github.com/COMCIFS/imgCIF repository, which is
>> >> >> intended to hold the DDLm version of the imgCIF dictionary. Anyone is
>> >> >> most welcome to comment on these pull requests of course, but I
>> >> >> emphasise that they simply use a different dictionary language for
>> >> >> defining the same data names, and therefore should have no
>> >> >> implications for current imgCIF/CBF usage.
>> >> >>
>> >> >> best wishes,
>> >> >> James.
>> >> >>
>> >> >> [1] pull request at https://github.com/yayahjb/cbflib/pull/39
>> >> >> [2] https://github.com/jamesrhester/ImgCIFHandler.jl
>> >> >>
>> >> >> On Mon, 12 Apr 2021 at 16:38, James H <jamesrhester@gmail.com> wrote:
>> >> >> >
>> >> >> > Dear All,
>> >> >> >
>> >> >> > Over a year later I have now written up definitions in DDL2 for inclu
sion in imgCIF. The full definitions are at the Github issue (https://github.com
/COMCIFS/imgCIF/issues/7). Please have a look and provide feedback here or there
. Note that I have added datanames for specifying that the images are contained
within compressed archives. I've checked a few known sources of images (proteind
iffraction.org, zenodo, a uni repository) and this scheme seems to cover those b
ases. If you have time, please have a look at your favourite open archive of raw
 data to see if this scheme is sufficient for you to specify a particular image
in that archive.  I've reproduced the examples from the definitions below.
>> >> >> >
>> >> >> > Of course, in a perfect world we would just give a DOI but those days
 are not yet upon us due to landing pages. Happy to be corrected on that.
>> >> >> >
>> >> >> > best wishes,
>> >> >> > James.
>> >> >> >
>> >> >> > Examples
>> >> >> > ========
>> >> >> > #  The frames are contained in a single HDF5-format file accessible
>> >> >> > #   at https://zenodo.org/record/12345/files/tartaric.h5. An array of
 2D
>> >> >> > #   images is found at HDF5 location /entry1/detector1/data
>> >> >> >
>> >> >> >      loop_
>> >> >> >     _array_data.array_id
>> >> >> >     _array_data.binary_id
>> >> >> >     _array_data.external_format
>> >> >> >     _array_data.location_uri
>> >> >> >     _array_data.external_path
>> >> >> >     _array_data.external_frame
>> >> >> >     1 1 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry
1/detector1/data 1
>> >> >> >     1 2 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry
1/detector1/data 2
>> >> >> >     ...
>> >> >> >
>> >> >> >  #  Frames are contained in individual Smart6000 Bruker-format files
>> >> >> >  #   accessible using https://uni_repo.edu/5341 in subdirectory run1.
>> >> >> >
>> >> >> >   loop_
>> >> >> >     _array_data.array_id
>> >> >> >     _array_data.binary_id
>> >> >> >     _array_data.external_format
>> >> >> >     _array_data.external_version
>> >> >> >     _array_data.location_uri
>> >> >> >     1 1 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.001
>> >> >> >     1 2 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.002
>> >> >> >     ...
>> >> >> >
>> >> >> > #  Frames with SMV format are contained at data.proteindiffraction.or
g in a tarred
>> >> >> > #    archive compressed with bzip2.
>> >> >> >
>> >> >> >     loop_
>> >> >> >     _array_data.array_id
>> >> >> >     _array_data.binary_id
>> >> >> >     _array_data.external_format
>> >> >> >     _array_data.location_uri
>> >> >> >     _array_data.external_archive_format
>> >> >> >     _array_data.external_archive_path
>> >> >> >     1 1 SMV
>> >> >> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-
sddc0001574_7k69.tar.bz2
>> >> >> >         TBZ
>> >> >> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0001.img
>> >> >> >     1 2 SMV
>> >> >> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-
sddc0001574_7k69.tar.bz2
>> >> >> >         TBZ
>> >> >> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0002.img
>> >> >> > 
>> >> >> >
>> >> >> > On Tue, 5 Mar 2019 at 16:37, James Hester <jamesrhester@gmail.com> wr
ote:
>> >> >> >>
>> >> >> >> OK, I've drafted up some definitions (just the human-readable part f
or now) for you all to peruse.  Please look at https://github.com/COMCIFS/imgCIF
/issues/7 and provide feedback here or there.
>> >> >> >>
>> >> >> >> all the the best,
>> >> >> >> James.
>> >> >> >>
>> >> >> >> On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com>
wrote:
>> >> >> >>>
>> >> >> >>> Thanks for the support Herbert. Does anybody have any concerns or i
mprovements to the data names that I sent originally? If not, I guess I will wri
te up some formal dictionary definitions for your consideration.
>> >> >> >>>
>> >> >> >>> James.
>> >> >> >>>
>> >> >> >>> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.c
om> wrote:
>> >> >> >>>>
>> >> >> >>>> Dear Colleagues,
>> >> >> >>>>
>> >> >> >>>>   Since 2012 NIAC and COMCIFS have worked cooperatively to make im
gCIF/CBF and NeXus/HDF5 fully interoperable.  This is very
>> >> >> >>>> far along, e.g.with NeXus/HDF5 NXtransformations having been added
 to NeXus/HDF5 to carry the same information as imgCIF/CBF AXIS.
>> >> >> >>>> What James has suggested will allow imgcif/CBF to carry the same d
ataset structure information as is conveyed in the external links of
>> >> >> >>>> an Eiger dataset, which divides the collected data into a master f
ile with the metadata and a set of datafiles.  This structural division
>> >> >> >>>> may not be important for some smaller datasets with only a few hun
dred to a few thousand frames, but can be very important in
>> >> >> >>>> handling datasets with more frames than that that are encountered
in serial crystallography.  Even for the smaller datasets this approach can
>> >> >> >>>> help to solve a problem for archives and facilities that need to s
tore metadata in a relational database while the data itself has been parked in
>> >> >> >>>> raw file systems, non-relational databases, zenodo, etc.  As with
almost all of CIF, imgCIF/CBF metadata maps very easily and directly
>> >> >> >>>> into relational tables, while putting NeXus/HDF5 metadata into a r
elational database first requires exactly the same sort of transformations
>> >> >> >>>> as we have already designed to map NeXus/HDF5 metadata into imgCIF
/CBF   To me it seems that James' suggestion is not a reinvention
>> >> >> >>>> of this particular wheel, but may be an important step in avoiding
 reinvention of the wheel.  This may avoid a lot of unnecessary transformation
>> >> >> >>>> of huge quantities of raw data in serial crystallography while mak
ing the metadata more accessible.
>> >> >> >>>>
>> >> >> >>>>   I would suggest giving James' suggestion serious consideration.
>> >> >> >>>>
>> >> >> >>>>   Regards,
>> >> >> >>>>     Herbert
>> >> >> >>>> while putting
>> >> >> >>>>
>> >> >> >>>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.c
om> wrote:
>> >> >> >>>>>
>> >> >> >>>>> Dear Graeme,
>> >> >> >>>>>
>> >> >> >>>>> The context of this is the idea that a single imgCIF file could b
e
>> >> >> >>>>> generated from a collection of raw image files (in whatever forma
t, whether
>> >> >> >>>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain th
e metadata
>> >> >> >>>>> pertaining to that collection. In such a situation, some way of r
eferring
>> >> >> >>>>> to the raw frames from within the imgCIF file is required.
>> >> >> >>>>>
>> >> >> >>>>> I agree that a perfectly reasonable approach is not to generate a
ny new
>> >> >> >>>>> file at all, and simply to access the metadata directly in whatev
er format
>> >> >> >>>>> happens to be there. This was my initial impulse as well and it t
ook me a
>> >> >> >>>>> while to understand that the actual proposal was to create an img
CIF file,
>> >> >> >>>>> rather than just use imgCIF datanames for specification purposes.
  From a
>> >> >> >>>>> semantic point of view both amount to the same thing so my only r
eal
>> >> >> >>>>> motivation here is to add an image linking facility to imgCIF so
that the
>> >> >> >>>>> "generate a summary metadata file" approach is possible.
>> >> >> >>>>>
>> >> >> >>>>> Could we just copy the HDF5 way of referring to objects in other
HDF5 files
>> >> >> >>>>> as a quick solution?
>> >> >> >>>>>
>> >> >> >>>>> all the best,
>> >> >> >>>>> James.
>> >> >> >>>>>
>> >> >> >>>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
>> >> >> >>>>> Graeme.Winter@diamond.ac.uk> wrote:
>> >> >> >>>>>
>> >> >> >>>>> > Dear James,
>> >> >> >>>>> >
>> >> >> >>>>> > On the face of it, this looks a lot to me like a reinvention of
 HDF5 -
>> >> >> >>>>> > perhaps with specific semantics - and there is already a (compl
ete?)
>> >> >> >>>>> > mapping from imgCIF to HDF5 / NeXus
>> >> >> >>>>> >
>> >> >> >>>>> > Have I missed something? No offence meant, trying to understand
 the shape
>> >> >> >>>>> > of the problem you are trying to solve
>> >> >> >>>>> >
>> >> >> >>>>> > Thanks & best wishes Graeme
>> >> >> >>>>> >
>> >> >> >>>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.co
m> wrote:
>> >> >> >>>>> > >
>> >> >> >>>>> > > Dear All,
>> >> >> >>>>> > >
>> >> >> >>>>> > > Recent Commdat discussion revealed a desire to reference exte
rnal images
>> >> >> >>>>> > > from within an imgCIF file. This would allow the metadata for
 a dataset
>> >> >> >>>>> > to
>> >> >> >>>>> > > be held within a single imgCIF file, while the frames themsel
ves remain
>> >> >> >>>>> > > separate. This avoids the impracticality of navigating throug
h an
>> >> >> >>>>> > enormous
>> >> >> >>>>> > > mulit-frame imgCIF file in order to extract a relatively comp
act amount
>> >> >> >>>>> > of
>> >> >> >>>>> > > information.
>> >> >> >>>>> > >
>> >> >> >>>>> > > As a starting proposal, I suggest we extend the _array_data c
ategory with
>> >> >> >>>>> > > the following three datanames:
>> >> >> >>>>> > >
>> >> >> >>>>> > > (1) _array_data.external_format    A value drawn from an enum
erated list
>> >> >> >>>>> > of
>> >> >> >>>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each
 enumerated
>> >> >> >>>>> > > value would explain how to interpret _array_data.internal_pat
h
>> >> >> >>>>> > > (2) _array_data.location_url           A URI for the file con
taining the
>> >> >> >>>>> > > image. A relative URL is relative to the location of the imgC
IF file
>> >> >> >>>>> > > (3) _array_data.internal_path        A format-specific string
 describing
>> >> >> >>>>> > > the location of the frame within the file identified by
>> >> >> >>>>> > > _array_data.location_uri, interpreted according to the value
given in
>> >> >> >>>>> > > _array_data.external_format
>> >> >> >>>>> > >
>> >> >> >>>>> > > So for a multi-frame HDF5 file buried in a subdirectory of th
e location
>> >> >> >>>>> > > referenced with a DOI, with appropriate definitions of the pa
th notation:
>> >> >> >>>>> > >
>> >> >> >>>>> > > loop_
>> >> >> >>>>> > > _array_data.array_id
>> >> >> >>>>> > > _array_data.binary_id
>> >> >> >>>>> > > _array_data.external_format
>> >> >> >>>>> > > _array_data.location_uri
>> >> >> >>>>> > > _array_data.internal_path
>> >> >> >>>>> > > 1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detec
tor/data[0]
>> >> >> >>>>> > > 1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detec
tor/data[1]
>> >> >> >>>>> > > ...
>> >> >> >>>>> > >
>> >> >> >>>>> > > Or for a bunch of single-frame files generated by an ADSC det
ector in the
>> >> >> >>>>> > > same directory as the imgCIF file
>> >> >> >>>>> > >
>> >> >> >>>>> > > _array_data.array_id
>> >> >> >>>>> > > _array_data.binary_id
>> >> >> >>>>> > > _array_data.external_format
>> >> >> >>>>> > > _array_data.location_uri
>> >> >> >>>>> > > 1 1 ADSC ./tartaric.001
>> >> >> >>>>> > > 1 2 ADSC ./tartaric.002
>> >> >> >>>>> > > 1 3 ADSC ./tartaric.003
>> >> >> >>>>> > > ...
>> >> >> >>>>> > >
>> >> >> >>>>> > > The imgCIF data items describing the structure of the data ar
ray would
>> >> >> >>>>> > > refer to the data after it has been provided by the format. T
he form in
>> >> >> >>>>> > > which it is provided should be specified in the definition of
 each value
>> >> >> >>>>> > of
>> >> >> >>>>> > > "_array_data.external_format".  So, for example, the various
compression
>> >> >> >>>>> > > methods in HDF5 would be invisible if the data as returned ar
e specified
>> >> >> >>>>> > to
>> >> >> >>>>> > > be an array of Reals.
>> >> >> >>>>> > >
>> >> >> >>>>> > > From the point of view of initial data validation, it would b
e sufficient
>> >> >> >>>>> > > to check that all referenced files are accessible, and that t
he provided
>> >> >> >>>>> > > locations exist.
>> >> >> >>>>> > >
>> >> >> >>>>> > > Thoughts?
>> >> >> >>>>> > > James.
>> >> >> >>>>> > >
>> >> >> >>>>> > > --
>> >> >> >>>>> > > T +61 (02) 9717 9907
>> >> >> >>>>> > > F +61 (02) 9717 3145
>> >> >> >>>>> > > M +61 (04) 0249 4148
>> >> >> >>>>> > > _______________________________________________
>> >> >> >>>>> > > imgcif-l mailing list
>> >> >> >>>>> > > imgcif-l@iucr.org
>> >> >> >>>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >> >> >>>>> >
>> >> >> >>>>> >
>> >> >> >>>>> > --
>> >> >> >>>>> > This e-mail and any attachments may contain confidential, copyr
ight and or
>> >> >> >>>>> > privileged material, and are for the use of the intended addres
see only. If
>> >> >> >>>>> > you are not the intended addressee or an authorised recipient o
f the
>> >> >> >>>>> > addressee please notify us of receipt by returning the e-mail a
nd do not
>> >> >> >>>>> > use, copy, retain, distribute or disclose the information in or
 attached to
>> >> >> >>>>> > the e-mail.
>> >> >> >>>>> > Any opinions expressed within this e-mail are those of the indi
vidual and
>> >> >> >>>>> > not necessarily of Diamond Light Source Ltd.
>> >> >> >>>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or
any
>> >> >> >>>>> > attachments are free from viruses and we cannot accept liabilit
y for any
>> >> >> >>>>> > damage which you may sustain as a result of software viruses wh
ich may be
>> >> >> >>>>> > transmitted in or with the message.
>> >> >> >>>>> > Diamond Light Source Limited (company no. 4375679). Registered
in England
>> >> >> >>>>> > and Wales with its registered office at Diamond House, Harwell
Science and
>> >> >> >>>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdo
m
>> >> >> >>>>> >
>> >> >> >>>>> >
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> T +61 (02) 9717 9907
>> >> >> >>>>> F +61 (02) 9717 3145
>> >> >> >>>>> M +61 (04) 0249 4148
>> >> >> >>>>> _______________________________________________
>> >> >> >>>>> imgcif-l mailing list
>> >> >> >>>>> imgcif-l@iucr.org
>> >> >> >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> T +61 (02) 9717 9907
>> >> >> >>> F +61 (02) 9717 3145
>> >> >> >>> M +61 (04) 0249 4148
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> T +61 (02) 9717 9907
>> >> >> >> F +61 (02) 9717 3145
>> >> >> >> M +61 (04) 0249 4148
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > T +61 (02) 9717 9907
>> >> >> > F +61 (02) 9717 3145
>> >> >> > M +61 (04) 0249 4148
>> >> >> 
>> >> >>
>> >> >> 
>> >> >> --
>> >> >> T +61 (02) 9717 9907
>> >> >> F +61 (02) 9717 3145
>> >> >> M +61 (04) 0249 4148
>> >>
>> >>
>> >>
>> >> --
>> >> T +61 (02) 9717 9907
>> >> F +61 (02) 9717 3145
>> >> M +61 (04) 0249 4148
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]