Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

Thanks Herbert for the clarification. Regarding array sections I think
we might be talking about different things but I'll park that for the
moment as it is not urgent.

What is urgent is that two of the new external data tags have been
left out of the update. Please see the issue at
https://github.com/yayahjb/cbflib/issues/46 which I'm drawing to your
attention here as I'm not sure if anybody looks at issues posted on
Github. I'd be happy to create a pull request if that makes life
easier.

all the best,
James.

On Sat, 7 May 2022 at 00:18, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
>
> Dear James,
>   The normalization was discussed ages ago and is consistent with DDL2 convent
ions, which
> are more normalized than core cif.  The array sections were introduced when we
 had to
> start dealing with the Eiger.  It is routine in an eiger 16M data collection t
o revert to a 4M
> ROI (built into the hardware) when more speed is required.  Such descriptions
have to be
> somewhere.  As speeds increase further, we will soon need to make more use of
> module-by-module ROIs, and we definitely will have to pull them in both indivi
dually
> and in groups instead of trying to only move full images.  What approach do yo
u suggest
> for such cases?
>   Regards,
>     Herbert
>
> On Thu, May 5, 2022 at 11:15 PM James H <jamesrhester@gmail.com> wrote:
>>
>> Thanks Herbert for making these updates. My apologies for taking so
>> long to come back to them.
>>
>> I notice that the new 1.8.5 has moved the _array_data.external_data_*
>> tags into a separate array_data_external_data loop. While I appreciate
>> that a separate loop is as good a place as any, I would also have
>> appreciated some discussion of this - perhaps I missed it? Anyway, I
>> do not propose to dispute it now.
>>
>> More importantly, _array_data_external_data.frame seems to have
>> acquired a format ARRAYID(start1:end1:stride1,start2:end2:stride2,
>> ...) which I don't recall discussing, and there are now references in
>> the definition to ARRAY_STRUCTURE_LIST which I believe miss the point
>> that the ARRAY_STRUCTURE_LIST items are used to characterise the array
>> after it has been obtained from the external data source, and are
>> definitely *not* supposed to describe the layout of the data within
>> the external data source. Likewise, ARRAY_ID refers to the layout of
>> the data after they have been delivered, and so have no direct
>> relevance to how the data are stored. I appreciate that C and Fortran
>> layout should be considered by the author of the imgCIF file when
>> describing what will be returned from the external source, but I'm not
>> sure that this warning is particularly necessary here as the author
>> will in any case be forced to consider the details of the
>> format-specific behaviour when constructing the external data pointer.
>>
>> thanks,
>> James.
>>
>> On Wed, 6 Apr 2022 at 23:39, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
>> >
>> > Dear Colleagues,
>> >
>> >   I propose the following plan of action to get James' changes into the cif
_img dictionary
>> >
>> >   0.  In both the yayahjb cbflib active branches:  main and CBFlib-0.9.7-de
vel, bring
>> > the currently posted cif_img_1.8.5 dictionary up to an agreed level (which
will be
>> > called 1.8.6 if there are any changes) and make one last CBFlib 0.9.6 relea
se with
>> > that as the default dictionary
>> >   1.  Merge the current CBFlib_0.9.7-devel branch into main
>> >   2.  Make that the default release in yayahjb
>> >
>> > If nobody objects, I plan to post the necessary pull requests and releases
this weekend.
>> >
>> >
>> >
>> > On Wed, Apr 6, 2022 at 5:10 AM James H <jamesrhester@gmail.com> wrote:
>> >>
>> >> Dear All,
>> >>
>> >> Just a quick note: a further year later and the external data pointers
>> >> work has not yet been merged, and neither has a further proposed data
>> >> name [1]. On the bright side an implementation using these pointers
>> >> has been published as a test of practicality [2]. It would of course
>> >> be most welcome if imgCIF deliberative processes could get themselves
>> >> to the point that these new data names are merged into the official
>> >> version of the main dictionary, given that no issues have been
>> >> identified.
>> >>
>> >> Meanwhile, in order to facilitate use of automated DDLm checking tools
>> >> on data files using imgCIF data names, I have now generated (1) a
>> >> direct translation of current version 1.8.4 into DDLm (2) a direct
>> >> translation with added external data pointers to DDLm in a separate
>> >> "journals-extension" branch. Both of these currently exist as pull
>> >> requests on the https://github.com/COMCIFS/imgCIF repository, which is
>> >> intended to hold the DDLm version of the imgCIF dictionary. Anyone is
>> >> most welcome to comment on these pull requests of course, but I
>> >> emphasise that they simply use a different dictionary language for
>> >> defining the same data names, and therefore should have no
>> >> implications for current imgCIF/CBF usage.
>> >>
>> >> best wishes,
>> >> James.
>> >>
>> >> [1] pull request at https://github.com/yayahjb/cbflib/pull/39
>> >> [2] https://github.com/jamesrhester/ImgCIFHandler.jl
>> >>
>> >> On Mon, 12 Apr 2021 at 16:38, James H <jamesrhester@gmail.com> wrote:
>> >> >
>> >> > Dear All,
>> >> >
>> >> > Over a year later I have now written up definitions in DDL2 for inclusio
n in imgCIF. The full definitions are at the Github issue (https://github.com/CO
MCIFS/imgCIF/issues/7). Please have a look and provide feedback here or there. N
ote that I have added datanames for specifying that the images are contained wit
hin compressed archives. I've checked a few known sources of images (proteindiff
raction.org, zenodo, a uni repository) and this scheme seems to cover those base
s. If you have time, please have a look at your favourite open archive of raw da
ta to see if this scheme is sufficient for you to specify a particular image in
that archive.  I've reproduced the examples from the definitions below.
>> >> >
>> >> > Of course, in a perfect world we would just give a DOI but those days ar
e not yet upon us due to landing pages. Happy to be corrected on that.
>> >> >
>> >> > best wishes,
>> >> > James.
>> >> >
>> >> > Examples
>> >> > ========
>> >> > #  The frames are contained in a single HDF5-format file accessible
>> >> > #   at https://zenodo.org/record/12345/files/tartaric.h5. An array of 2D
>> >> > #   images is found at HDF5 location /entry1/detector1/data
>> >> >
>> >> >      loop_
>> >> >     _array_data.array_id
>> >> >     _array_data.binary_id
>> >> >     _array_data.external_format
>> >> >     _array_data.location_uri
>> >> >     _array_data.external_path
>> >> >     _array_data.external_frame
>> >> >     1 1 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/d
etector1/data 1
>> >> >     1 2 HDF5 https://zenodo.org/record/12345/files/tartaric.h5 /entry1/d
etector1/data 2
>> >> >     ...
>> >> >
>> >> >  #  Frames are contained in individual Smart6000 Bruker-format files
>> >> >  #   accessible using https://uni_repo.edu/5341 in subdirectory run1.
>> >> >
>> >> >   loop_
>> >> >     _array_data.array_id
>> >> >     _array_data.binary_id
>> >> >     _array_data.external_format
>> >> >     _array_data.external_version
>> >> >     _array_data.location_uri
>> >> >     1 1 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.001
>> >> >     1 2 Bruker Smart6000 https://uni_repo.edu/5341/run1/tartaric.002
>> >> >     ...
>> >> >
>> >> > #  Frames with SMV format are contained at data.proteindiffraction.org i
n a tarred
>> >> > #    archive compressed with bzip2.
>> >> >
>> >> >     loop_
>> >> >     _array_data.array_id
>> >> >     _array_data.binary_id
>> >> >     _array_data.external_format
>> >> >     _array_data.location_uri
>> >> >     _array_data.external_archive_format
>> >> >     _array_data.external_archive_path
>> >> >     1 1 SMV
>> >> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sdd
c0001574_7k69.tar.bz2
>> >> >         TBZ
>> >> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0001.img
>> >> >     1 2 SMV
>> >> >         https://data.proteindiffraction.org/ssgcid/MyulA_01062_a_B12-sdd
c0001574_7k69.tar.bz2
>> >> >         TBZ
>> >> >         MyulA_01062_a_B12-sddc0001574_7k69/data/317895h4_y_0002.img
>> >> >
>> >> >
>> >> > On Tue, 5 Mar 2019 at 16:37, James Hester <jamesrhester@gmail.com> wrote
:
>> >> >>
>> >> >> OK, I've drafted up some definitions (just the human-readable part for
now) for you all to peruse.  Please look at https://github.com/COMCIFS/imgCIF/is
sues/7 and provide feedback here or there.
>> >> >>
>> >> >> all the the best,
>> >> >> James.
>> >> >>
>> >> >> On Thu, 14 Feb 2019 at 14:39, James Hester <jamesrhester@gmail.com> wro
te:
>> >> >>>
>> >> >>> Thanks for the support Herbert. Does anybody have any concerns or impr
ovements to the data names that I sent originally? If not, I guess I will write
up some formal dictionary definitions for your consideration.
>> >> >>>
>> >> >>> James.
>> >> >>>
>> >> >>> On Wed, 13 Feb 2019 at 21:39, Herbert J. Bernstein <yayahjb@gmail.com>
 wrote:
>> >> >>>>
>> >> >>>> Dear Colleagues,
>> >> >>>>
>> >> >>>>   Since 2012 NIAC and COMCIFS have worked cooperatively to make imgCI
F/CBF and NeXus/HDF5 fully interoperable.  This is very
>> >> >>>> far along, e.g.with NeXus/HDF5 NXtransformations having been added to
 NeXus/HDF5 to carry the same information as imgCIF/CBF AXIS.
>> >> >>>> What James has suggested will allow imgcif/CBF to carry the same data
set structure information as is conveyed in the external links of
>> >> >>>> an Eiger dataset, which divides the collected data into a master file
 with the metadata and a set of datafiles.  This structural division
>> >> >>>> may not be important for some smaller datasets with only a few hundre
d to a few thousand frames, but can be very important in
>> >> >>>> handling datasets with more frames than that that are encountered in
serial crystallography.  Even for the smaller datasets this approach can
>> >> >>>> help to solve a problem for archives and facilities that need to stor
e metadata in a relational database while the data itself has been parked in
>> >> >>>> raw file systems, non-relational databases, zenodo, etc.  As with alm
ost all of CIF, imgCIF/CBF metadata maps very easily and directly
>> >> >>>> into relational tables, while putting NeXus/HDF5 metadata into a rela
tional database first requires exactly the same sort of transformations
>> >> >>>> as we have already designed to map NeXus/HDF5 metadata into imgCIF/CB
F   To me it seems that James' suggestion is not a reinvention
>> >> >>>> of this particular wheel, but may be an important step in avoiding re
invention of the wheel.  This may avoid a lot of unnecessary transformation
>> >> >>>> of huge quantities of raw data in serial crystallography while making
 the metadata more accessible.
>> >> >>>>
>> >> >>>>   I would suggest giving James' suggestion serious consideration.
>> >> >>>>
>> >> >>>>   Regards,
>> >> >>>>     Herbert
>> >> >>>> while putting
>> >> >>>>
>> >> >>>> On Wed, Feb 13, 2019 at 4:02 AM James Hester <jamesrhester@gmail.com>
 wrote:
>> >> >>>>>
>> >> >>>>> Dear Graeme,
>> >> >>>>>
>> >> >>>>> The context of this is the idea that a single imgCIF file could be
>> >> >>>>> generated from a collection of raw image files (in whatever format,
whether
>> >> >>>>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the m
etadata
>> >> >>>>> pertaining to that collection. In such a situation, some way of refe
rring
>> >> >>>>> to the raw frames from within the imgCIF file is required.
>> >> >>>>>
>> >> >>>>> I agree that a perfectly reasonable approach is not to generate any
new
>> >> >>>>> file at all, and simply to access the metadata directly in whatever
format
>> >> >>>>> happens to be there. This was my initial impulse as well and it took
 me a
>> >> >>>>> while to understand that the actual proposal was to create an imgCIF
 file,
>> >> >>>>> rather than just use imgCIF datanames for specification purposes.  F
rom a
>> >> >>>>> semantic point of view both amount to the same thing so my only real
>> >> >>>>> motivation here is to add an image linking facility to imgCIF so tha
t the
>> >> >>>>> "generate a summary metadata file" approach is possible.
>> >> >>>>>
>> >> >>>>> Could we just copy the HDF5 way of referring to objects in other HDF
5 files
>> >> >>>>> as a quick solution?
>> >> >>>>>
>> >> >>>>> all the best,
>> >> >>>>> James.
>> >> >>>>>
>> >> >>>>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
>> >> >>>>> Graeme.Winter@diamond.ac.uk> wrote:
>> >> >>>>>
>> >> >>>>> > Dear James,
>> >> >>>>> >
>> >> >>>>> > On the face of it, this looks a lot to me like a reinvention of HD
F5 -
>> >> >>>>> > perhaps with specific semantics - and there is already a (complete
?)
>> >> >>>>> > mapping from imgCIF to HDF5 / NeXus
>> >> >>>>> >
>> >> >>>>> > Have I missed something? No offence meant, trying to understand th
e shape
>> >> >>>>> > of the problem you are trying to solve
>> >> >>>>> >
>> >> >>>>> > Thanks & best wishes Graeme
>> >> >>>>> >
>> >> >>>>> > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com>
wrote:
>> >> >>>>> > >
>> >> >>>>> > > Dear All,
>> >> >>>>> > >
>> >> >>>>> > > Recent Commdat discussion revealed a desire to reference externa
l images
>> >> >>>>> > > from within an imgCIF file. This would allow the metadata for a
dataset
>> >> >>>>> > to
>> >> >>>>> > > be held within a single imgCIF file, while the frames themselves
 remain
>> >> >>>>> > > separate. This avoids the impracticality of navigating through a
n
>> >> >>>>> > enormous
>> >> >>>>> > > mulit-frame imgCIF file in order to extract a relatively compact
 amount
>> >> >>>>> > of
>> >> >>>>> > > information.
>> >> >>>>> > >
>> >> >>>>> > > As a starting proposal, I suggest we extend the _array_data cate
gory with
>> >> >>>>> > > the following three datanames:
>> >> >>>>> > >
>> >> >>>>> > > (1) _array_data.external_format    A value drawn from an enumera
ted list
>> >> >>>>> > of
>> >> >>>>> > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each en
umerated
>> >> >>>>> > > value would explain how to interpret _array_data.internal_path
>> >> >>>>> > > (2) _array_data.location_url           A URI for the file contai
ning the
>> >> >>>>> > > image. A relative URL is relative to the location of the imgCIF
file
>> >> >>>>> > > (3) _array_data.internal_path        A format-specific string de
scribing
>> >> >>>>> > > the location of the frame within the file identified by
>> >> >>>>> > > _array_data.location_uri, interpreted according to the value giv
en in
>> >> >>>>> > > _array_data.external_format
>> >> >>>>> > >
>> >> >>>>> > > So for a multi-frame HDF5 file buried in a subdirectory of the l
ocation
>> >> >>>>> > > referenced with a DOI, with appropriate definitions of the path
notation:
>> >> >>>>> > >
>> >> >>>>> > > loop_
>> >> >>>>> > > _array_data.array_id
>> >> >>>>> > > _array_data.binary_id
>> >> >>>>> > > _array_data.external_format
>> >> >>>>> > > _array_data.location_uri
>> >> >>>>> > > _array_data.internal_path
>> >> >>>>> > > 1 1 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector
/data[0]
>> >> >>>>> > > 1 2 NXMX doi:x.y.z directory/run/masterfilename:/entry1/detector
/data[1]
>> >> >>>>> > > ...
>> >> >>>>> > >
>> >> >>>>> > > Or for a bunch of single-frame files generated by an ADSC detect
or in the
>> >> >>>>> > > same directory as the imgCIF file
>> >> >>>>> > >
>> >> >>>>> > > _array_data.array_id
>> >> >>>>> > > _array_data.binary_id
>> >> >>>>> > > _array_data.external_format
>> >> >>>>> > > _array_data.location_uri
>> >> >>>>> > > 1 1 ADSC ./tartaric.001
>> >> >>>>> > > 1 2 ADSC ./tartaric.002
>> >> >>>>> > > 1 3 ADSC ./tartaric.003
>> >> >>>>> > > ...
>> >> >>>>> > >
>> >> >>>>> > > The imgCIF data items describing the structure of the data array
 would
>> >> >>>>> > > refer to the data after it has been provided by the format. The
form in
>> >> >>>>> > > which it is provided should be specified in the definition of ea
ch value
>> >> >>>>> > of
>> >> >>>>> > > "_array_data.external_format".  So, for example, the various com
pression
>> >> >>>>> > > methods in HDF5 would be invisible if the data as returned are s
pecified
>> >> >>>>> > to
>> >> >>>>> > > be an array of Reals.
>> >> >>>>> > >
>> >> >>>>> > > From the point of view of initial data validation, it would be s
ufficient
>> >> >>>>> > > to check that all referenced files are accessible, and that the
provided
>> >> >>>>> > > locations exist.
>> >> >>>>> > >
>> >> >>>>> > > Thoughts?
>> >> >>>>> > > James.
>> >> >>>>> > >
>> >> >>>>> > > --
>> >> >>>>> > > T +61 (02) 9717 9907
>> >> >>>>> > > F +61 (02) 9717 3145
>> >> >>>>> > > M +61 (04) 0249 4148
>> >> >>>>> > > _______________________________________________
>> >> >>>>> > > imgcif-l mailing list
>> >> >>>>> > > imgcif-l@iucr.org
>> >> >>>>> > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > --
>> >> >>>>> > This e-mail and any attachments may contain confidential, copyrigh
t and or
>> >> >>>>> > privileged material, and are for the use of the intended addressee
 only. If
>> >> >>>>> > you are not the intended addressee or an authorised recipient of t
he
>> >> >>>>> > addressee please notify us of receipt by returning the e-mail and
do not
>> >> >>>>> > use, copy, retain, distribute or disclose the information in or at
tached to
>> >> >>>>> > the e-mail.
>> >> >>>>> > Any opinions expressed within this e-mail are those of the individ
ual and
>> >> >>>>> > not necessarily of Diamond Light Source Ltd.
>> >> >>>>> > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>> >> >>>>> > attachments are free from viruses and we cannot accept liability f
or any
>> >> >>>>> > damage which you may sustain as a result of software viruses which
 may be
>> >> >>>>> > transmitted in or with the message.
>> >> >>>>> > Diamond Light Source Limited (company no. 4375679). Registered in
England
>> >> >>>>> > and Wales with its registered office at Diamond House, Harwell Sci
ence and
>> >> >>>>> > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>> >> >>>>> > 
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> T +61 (02) 9717 9907
>> >> >>>>> F +61 (02) 9717 3145
>> >> >>>>> M +61 (04) 0249 4148
>> >> >>>>> _______________________________________________
>> >> >>>>> imgcif-l mailing list
>> >> >>>>> imgcif-l@iucr.org
>> >> >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> T +61 (02) 9717 9907
>> >> >>> F +61 (02) 9717 3145
>> >> >>> M +61 (04) 0249 4148
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> T +61 (02) 9717 9907
>> >> >> F +61 (02) 9717 3145
>> >> >> M +61 (04) 0249 4148
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > T +61 (02) 9717 9907
>> >> > F +61 (02) 9717 3145
>> >> > M +61 (04) 0249 4148
>> >>
>> >>
>> >>
>> >> --
>> >> T +61 (02) 9717 9907
>> >> F +61 (02) 9717 3145
>> >> M +61 (04) 0249 4148
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]