Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

Adding at least one check item seems like a fine idea, I will add it as an
issue on imgCIF so we don't lose sight of it.

On Thu, 14 Feb 2019 at 20:01, Jonathan WRIGHT <wright@esrf.fr> wrote:

> Dear James
>
> This sounds like a good and pragmatic way to do it. A remaining thought
> is whether to offer an optional way to validate the data found at the
> other end of a link. Was it over-written, corrupted, does the reading
> library have a  bug, etc ?
>
> In practice we often record 1D counters from images, like mean, stddev,
> min, max etc, either for the whole image or a few selected roi. Writing
> an optional column with something similar might be useful for quickly
> locating interesting frames as well as acting as checksum data. Is there
> already something like these ?
>
>    array_data.data_sum
>    array_data.data_mean
>    array_data.data_min
>    array_data.data_max
>    array_data.data_stddev
>
> ( Any ROI would be in a different array_id )
>
> Perhaps there is already a mechanism and I am just not aware of it? The
> same thing would be useful for NeXus too, but I did not manage to locate
> it.
>
> Of course, there is no need to add this now if it adds too many
> complications for what you actually need.
>
> Best,
>
> Jon
> 
> 
> 
> On 14/02/2019 04:36, James Hester wrote:
> > Dear Jon,
> >
> > The answer to your questions as I see it is that the imgCIF
> specifications
> > work at a semantic level. So, if the NeXus specifications say that a
> > multidimensional array of numbers is found at a particular location, that
> > is all that imgCIF cares about. The way in which those numbers are stored
> > is not relevant.  Tools that actually want to use the linking information
> > within imgCIF would have to know, for each external_format, how to access
> > items at a given location.  This is not an obstacle in practice due to
> the
> > many libraries available that do just this.
> >
> > There is another missing data name in _array_data: a dataname to hold the
> > actual data as an array of Real numbers.  This array is implicitly
> > available as the result of processing the image data, but it would be
> most
> > convenient for eventually writing dREL methods to manipulate images for
> > this to a be defined as a separate dataname.  So if it helps to see what
> > I'm saying, imagine that the task is to calculate values of this dataname
> > either from data frames within the imgCIF, or from an external source.
> If
> > an external source is chosen, then it delivers an array of numbers
> directly
> > into the dataname as long as the location of the external array is
> > unambiguously specified.
> >
> > Alternatively, we could specify that the information delivered from the
> > external format is processed according to the array_structure and
> > array_structure_list loops.  If both loops are empty, an array of Reals
> is
> > returned from the external format; if array_structure has some
> information,
> > that is used to process a byte string returned from the tool.  So in the
> > case of ADSC, an imgCIF file could specify both of array_structure and
> > array_structure_list if the specification of "ADSC" format states that a
> > string of bytes is returned. Or an alternative "ADSC-processed" format
> > could be specified that returns an array of integers.
> >
> > I've inserted some answers below.
> >
> > On Wed, 13 Feb 2019 at 21:37, Jonathan WRIGHT <wright@esrf.fr> wrote:
> >
> >> Dear James,
> >>
> >> An index could be very useful but there seem to be some practical
> >> problems to overcome:
> >>
> >> - How should this handle different frame formats?
> >>
> >
> > If you mean things like compression and array layout, then that is dealt
> > with outside of imgCIF by tools that wish to make use of the linking
> > information.
> >
> > - Does it pull in detailed external_format binary descriptions?
> >>
> > 
> > No
> > 
> > 
> >> - What if frames are using proprietary compression?
> >>
> >
> > See above - that is not imgCIF's concern
> >
> >>
> >> With HDF the it seems the library accepts binary data described by a set
> >> of "External Storage Properties" and takes care of reading this data
> >> too. As it only arrived in h5py in the last release (2.9, Dec 2018) it
> >> is something I look forwards to trying out soon. So far I do not know
> >> if you can have compression and things like ascii overflow tables. If
> >> anyone can share examples it would help me to learn to use it.
> >>
> >> Not sure this message helps... if you leave out the need to read the
> >> data then everything would be simplified, but then what is the index
> >> going to be used for ?
> >>
> > 
> > Well, data use is indeed optional. imgCIF is only providing sufficient
> > information to allow unambiguous access to the data for those who wish to
> > do this. I expect that in many cases users will stick to their usual
> tools.
> > imgCIF descriptions would only become useful in situations where you are
> > processing data from an unfamiliar archive.
> >
> > 
> >>
> >> All the best,
> >>
> >> Jon
> >>
> >>
> >>
> >> PS: and many thanks for pycifrw !
> >>
> >
> > Thanks for the feedback, I don't often hear about where it is being used
> > unless there's a problem.
> >
> >>
> >>
> >> On 13/02/2019 10:01, James Hester wrote:
> >>> Dear Graeme,
> >>>
> >>> The context of this is the idea that a single imgCIF file could be
> >>> generated from a collection of raw image files (in whatever format,
> >> whether
> >>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the
> >> metadata
> >>> pertaining to that collection. In such a situation, some way of
> referring
> >>> to the raw frames from within the imgCIF file is required.
 > >>>
> >>> I agree that a perfectly reasonable approach is not to generate any new
> >>> file at all, and simply to access the metadata directly in whatever
> >> format
> >>> happens to be there. This was my initial impulse as well and it took
> me a
> >>> while to understand that the actual proposal was to create an imgCIF
> >> file,
> >>> rather than just use imgCIF datanames for specification purposes.
> From a
> >>> semantic point of view both amount to the same thing so my only real
> >>> motivation here is to add an image linking facility to imgCIF so that
> the
> >>> "generate a summary metadata file" approach is possible.
> >>>
> >>> Could we just copy the HDF5 way of referring to objects in other HDF5
> >> files
> >>> as a quick solution?
> >>>
> >>> all the best,
> >>> James.
> >>>
> >>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
> >>> Graeme.Winter@diamond.ac.uk> wrote:
> >>>
> >>>> Dear James,
> >>>>
> >>>> On the face of it, this looks a lot to me like a reinvention of HDF5 -
> >>>> perhaps with specific semantics - and there is already a (complete?)
> >>>> mapping from imgCIF to HDF5 / NeXus
> >>>>
> >>>> Have I missed something? No offence meant, trying to understand the
> >> shape
> >>>> of the problem you are trying to solve
> >>>>
> >>>> Thanks & best wishes Graeme
> >>>>
> >>>>> On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com>
> wrote:
> >>>>>
> >>>>> Dear All,
> >>>>>
> >>>>> Recent Commdat discussion revealed a desire to reference external
> >> images
> >>>>> from within an imgCIF file. This would allow the metadata for a
> dataset
> >>>> to
> >>>>> be held within a single imgCIF file, while the frames themselves
> remain
> >>>>> separate. This avoids the impracticality of navigating through an
> >>>> enormous
> >>>>> mulit-frame imgCIF file in order to extract a relatively compact
> amount
> >>>> of
> >>>>> information.
> >>>>>
> >>>>> As a starting proposal, I suggest we extend the _array_data category
> >> with
> >>>>> the following three datanames:
> >>>>>
> >>>>> (1) _array_data.external_format    A value drawn from an enumerated
> >> list
> >>>> of
> >>>>> formats (e.g. "SMV","HDF5","Bruker"). The definition for each
> >> enumerated
> >>>>> value would explain how to interpret _array_data.internal_path
> >>>>> (2) _array_data.location_url           A URI for the file containing
> >> the
> >>>>> image. A relative URL is relative to the location of the imgCIF file
> >>>>> (3) _array_data.internal_path        A format-specific string
> >> describing
> >>>>> the location of the frame within the file identified by
> >>>>> _array_data.location_uri, interpreted according to the value given in
> >>>>> _array_data.external_format
> >>>>>
> >>>>> So for a multi-frame HDF5 file buried in a subdirectory of the
> location
> >>>>> referenced with a DOI, with appropriate definitions of the path
> >> notation:
> >>>>>
> >>>>> loop_
> >>>>> _array_data.array_id
> >>>>> _array_data.binary_id
> >>>>> _array_data.external_format
> >>>>> _array_data.location_uri
> >>>>> _array_data.internal_path
> >>>>> 1 1 NXMX doi:x.y.z
> >> directory/run/masterfilename:/entry1/detector/data[0]
> >>>>> 1 2 NXMX doi:x.y.z
> >> directory/run/masterfilename:/entry1/detector/data[1]
> >>>>> ...
> >>>>>
> >>>>> Or for a bunch of single-frame files generated by an ADSC detector in
> >> the
> >>>>> same directory as the imgCIF file
> >>>>> 
> >>>>> _array_data.array_id
> >>>>> _array_data.binary_id
> >>>>> _array_data.external_format
> >>>>> _array_data.location_uri
> >>>>> 1 1 ADSC ./tartaric.001
> >>>>> 1 2 ADSC ./tartaric.002
> >>>>> 1 3 ADSC ./tartaric.003
> >>>>> ...
> >>>>> 
> >>>>> The imgCIF data items describing the structure of the data array
> would
> >>>>> refer to the data after it has been provided by the format. The form
> in
> >>>>> which it is provided should be specified in the definition of each
> >> value
> >>>> of
> >>>>> "_array_data.external_format".  So, for example, the various
> >> compression
> >>>>> methods in HDF5 would be invisible if the data as returned are
> >> specified
> >>>> to
> >>>>> be an array of Reals.
> >>>>> 
> >>>>>   From the point of view of initial data validation, it would be
> >> sufficient
> >>>>> to check that all referenced files are accessible, and that the
> >> provided
> >>>>> locations exist.
> >>>>> 
> >>>>> Thoughts?
> >>>>> James.
> >>>>>
> >>>>> --
> >>>>> T +61 (02) 9717 9907
> >>>>> F +61 (02) 9717 3145
> >>>>> M +61 (04) 0249 4148
> >>>>> _______________________________________________
> >>>>> imgcif-l mailing list
> >>>>> imgcif-l@iucr.org
> >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
> >>>>
> >>>>
> >>>> --
> >>>> This e-mail and any attachments may contain confidential, copyright
> and
> >> or
> >>>> privileged material, and are for the use of the intended addressee
> >> only. If
> >>>> you are not the intended addressee or an authorised recipient of the
> >>>> addressee please notify us of receipt by returning the e-mail and do
> not
> >>>> use, copy, retain, distribute or disclose the information in or
> >> attached to
> >>>> the e-mail.
> >>>> Any opinions expressed within this e-mail are those of the individual
> >> and
> >>>> not necessarily of Diamond Light Source Ltd.
> >>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> >>>> attachments are free from viruses and we cannot accept liability for
> any
> >>>> damage which you may sustain as a result of software viruses which may
> >> be 
> >>>> transmitted in or with the message.
> >>>> Diamond Light Source Limited (company no. 4375679). Registered in
> >> England
> >>>> and Wales with its registered office at Diamond House, Harwell Science
> >> and
> >>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> >>>>
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> imgcif-l mailing list
> >> imgcif-l@iucr.org
> >> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
> >>
> >
> >
> 
> _______________________________________________
> imgcif-l mailing list
> imgcif-l@iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
> 


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]