Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] Adding references to external files to imgCIF

Dear James

This sounds like a good and pragmatic way to do it. A remaining thought is
whether to offer an optional way to validate the data found at the other end
of a link. Was it over-written, corrupted, does the reading library have a
bug, etc ?

In practice we often record 1D counters from images, like mean, stddev, min,
max etc, either for the whole image or a few selected roi. Writing an optional
column with something similar might be useful for quickly locating interesting
frames as well as acting as checksum data. Is there already something like
these ?

  array_data.data_sum
  array_data.data_mean
  array_data.data_min
  array_data.data_max
  array_data.data_stddev

( Any ROI would be in a different array_id )

Perhaps there is already a mechanism and I am just not aware of it? The same
thing would be useful for NeXus too, but I did not manage to locate it.

Of course, there is no need to add this now if it adds too many complications
for what you actually need.

Best,

Jon



On 14/02/2019 04:36, James Hester wrote:
> Dear Jon,
> 
> The answer to your questions as I see it is that the imgCIF specifications
> work at a semantic level. So, if the NeXus specifications say that a
> multidimensional array of numbers is found at a particular location, that
> is all that imgCIF cares about. The way in which those numbers are stored
> is not relevant.  Tools that actually want to use the linking information
> within imgCIF would have to know, for each external_format, how to access
> items at a given location.  This is not an obstacle in practice due to the
> many libraries available that do just this.
> 
> There is another missing data name in _array_data: a dataname to hold the
> actual data as an array of Real numbers.  This array is implicitly
> available as the result of processing the image data, but it would be most
> convenient for eventually writing dREL methods to manipulate images for
> this to a be defined as a separate dataname.  So if it helps to see what
> I'm saying, imagine that the task is to calculate values of this dataname
> either from data frames within the imgCIF, or from an external source.  If
> an external source is chosen, then it delivers an array of numbers directly
> into the dataname as long as the location of the external array is
> unambiguously specified.
>
> Alternatively, we could specify that the information delivered from the
> external format is processed according to the array_structure and
> array_structure_list loops.  If both loops are empty, an array of Reals is
> returned from the external format; if array_structure has some information,
> that is used to process a byte string returned from the tool.  So in the
> case of ADSC, an imgCIF file could specify both of array_structure and
> array_structure_list if the specification of "ADSC" format states that a
> string of bytes is returned. Or an alternative "ADSC-processed" format
> could be specified that returns an array of integers.
>
> I've inserted some answers below.
>
> On Wed, 13 Feb 2019 at 21:37, Jonathan WRIGHT <wright@esrf.fr> wrote:
>
> > Dear James,
> >
> > An index could be very useful but there seem to be some practical
> > problems to overcome:
> >
> > - How should this handle different frame formats?
> >
>
> If you mean things like compression and array layout, then that is dealt
> with outside of imgCIF by tools that wish to make use of the linking
> information.
>
> - Does it pull in detailed external_format binary descriptions?
> >
>
> No
>
> 
> > - What if frames are using proprietary compression?
> >
> 
> See above - that is not imgCIF's concern
> 
> >
> > With HDF the it seems the library accepts binary data described by a set
> > of "External Storage Properties" and takes care of reading this data
> > too. As it only arrived in h5py in the last release (2.9, Dec 2018) it
> > is something I look forwards to trying out soon. So far I do not know
> > if you can have compression and things like ascii overflow tables. If
> > anyone can share examples it would help me to learn to use it.
> >
> > Not sure this message helps... if you leave out the need to read the
> > data then everything would be simplified, but then what is the index
> > going to be used for ?
> >
> 
> Well, data use is indeed optional. imgCIF is only providing sufficient
> information to allow unambiguous access to the data for those who wish to
> do this. I expect that in many cases users will stick to their usual tools.
> imgCIF descriptions would only become useful in situations where you are
> processing data from an unfamiliar archive.
> 
> 
> >
> > All the best,
> >
> > Jon
> >
> >
> >
> > PS: and many thanks for pycifrw !
> >
>
> Thanks for the feedback, I don't often hear about where it is being used
> unless there's a problem.
> 
> > 
> >
> > On 13/02/2019 10:01, James Hester wrote:
> > > Dear Graeme,
> > > 
> > > The context of this is the idea that a single imgCIF file could be
> > > generated from a collection of raw image files (in whatever format,
> > whether
> > > HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the
> > metadata
> > > pertaining to that collection. In such a situation, some way of referring
> > > to the raw frames from within the imgCIF file is required.
> > > 
> > > I agree that a perfectly reasonable approach is not to generate any new
> > > file at all, and simply to access the metadata directly in whatever
> > format
> > > happens to be there. This was my initial impulse as well and it took me a
> > > while to understand that the actual proposal was to create an imgCIF
> > file,
> > > rather than just use imgCIF datanames for specification purposes.  From a
> > > semantic point of view both amount to the same thing so my only real
> > > motivation here is to add an image linking facility to imgCIF so that the
> > > "generate a summary metadata file" approach is possible.
> > > 
> > > Could we just copy the HDF5 way of referring to objects in other HDF5
> > files
> > > as a quick solution?
> > > 
> > > all the best,
> > > James.
> > > 
> > > On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk <
> > > Graeme.Winter@diamond.ac.uk> wrote:
> > > 
> > > > Dear James,
> > > > 
> > > > On the face of it, this looks a lot to me like a reinvention of HDF5 -
> > > > perhaps with specific semantics - and there is already a (complete?)
> > > > mapping from imgCIF to HDF5 / NeXus
> > > > 
> > > > Have I missed something? No offence meant, trying to understand the
> > shape
> > > > of the problem you are trying to solve
> > > > 
> > > > Thanks & best wishes Graeme
> > > > 
> > > > > On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com> wrote:
> > > > > 
> > > > > Dear All,
> > > > > 
> > > > > Recent Commdat discussion revealed a desire to reference external
> > images
> > > > > from within an imgCIF file. This would allow the metadata for a datase
t
> > > > to
> > > > > be held within a single imgCIF file, while the frames themselves remai
n
> > > > > separate. This avoids the impracticality of navigating through an
> > > > enormous
> > > > > mulit-frame imgCIF file in order to extract a relatively compact amoun
t
> > > > of
> > > > > information.
> > > > > 
> > > > > As a starting proposal, I suggest we extend the _array_data category
> > with
> > > > > the following three datanames:
> > > > > 
> > > > > (1) _array_data.external_format    A value drawn from an enumerated
> > list
> > > > of
> > > > > formats (e.g. "SMV","HDF5","Bruker"). The definition for each
> > enumerated
> > > > > value would explain how to interpret _array_data.internal_path
> > > > > (2) _array_data.location_url           A URI for the file containing
> > the
> > > > > image. A relative URL is relative to the location of the imgCIF file
> > > > > (3) _array_data.internal_path        A format-specific string
> > describing
> > > > > the location of the frame within the file identified by
> > > > > _array_data.location_uri, interpreted according to the value given in
> > > > > _array_data.external_format
> > > > > 
> > > > > So for a multi-frame HDF5 file buried in a subdirectory of the locatio
n
> > > > > referenced with a DOI, with appropriate definitions of the path
> > notation:
> > > > > 
> > > > > loop_
> > > > > _array_data.array_id
> > > > > _array_data.binary_id
> > > > > _array_data.external_format
> > > > > _array_data.location_uri
> > > > > _array_data.internal_path
> > > > > 1 1 NXMX doi:x.y.z
> > directory/run/masterfilename:/entry1/detector/data[0]
> > > > > 1 2 NXMX doi:x.y.z
> > directory/run/masterfilename:/entry1/detector/data[1]
> > > > > ...
> > > > > 
> > > > > Or for a bunch of single-frame files generated by an ADSC detector in
> > the
> > > > > same directory as the imgCIF file
> > > > > 
> > > > > _array_data.array_id
> > > > > _array_data.binary_id
> > > > > _array_data.external_format
> > > > > _array_data.location_uri
> > > > > 1 1 ADSC ./tartaric.001
> > > > > 1 2 ADSC ./tartaric.002
> > > > > 1 3 ADSC ./tartaric.003
> > > > > ...
> > > > > 
> > > > > The imgCIF data items describing the structure of the data array would
> > > > > refer to the data after it has been provided by the format. The form i
n
> > > > > which it is provided should be specified in the definition of each
> > value
> > > > of
> > > > > "_array_data.external_format".  So, for example, the various
> > compression
> > > > > methods in HDF5 would be invisible if the data as returned are
> > specified
> > > > to
> > > > > be an array of Reals.
> > > > > 
> > > > >   From the point of view of initial data validation, it would be
> > sufficient
> > > > > to check that all referenced files are accessible, and that the
> > provided
> > > > > locations exist.
> > > > > 
> > > > > Thoughts?
> > > > > James.
> > > > > 
> > > > > --
> > > > > T +61 (02) 9717 9907
> > > > > F +61 (02) 9717 3145
> > > > > M +61 (04) 0249 4148
> > > > > _______________________________________________
> > > > > imgcif-l mailing list
> > > > > imgcif-l@iucr.org
> > > > > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
> > > > 
> > > > 
> > > > --
> > > > This e-mail and any attachments may contain confidential, copyright and
> > or
> > > > privileged material, and are for the use of the intended addressee
> > only. If
> > > > you are not the intended addressee or an authorised recipient of the
> > > > addressee please notify us of receipt by returning the e-mail and do not
> > > > use, copy, retain, distribute or disclose the information in or
> > attached to
> > > > the e-mail.
> > > > Any opinions expressed within this e-mail are those of the individual
> > and
> > > > not necessarily of Diamond Light Source Ltd.
> > > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> > > > attachments are free from viruses and we cannot accept liability for any
> > > > damage which you may sustain as a result of software viruses which may
> > be
> > > > transmitted in or with the message.
> > > > Diamond Light Source Limited (company no. 4375679). Registered in
> > England
> > > > and Wales with its registered office at Diamond House, Harwell Science
> > and
> > > > Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> > > > 
> > > > 
> > > 
> >
> > _______________________________________________
> > imgcif-l mailing list
> > imgcif-l@iucr.org
> > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l

Reply to: [list | sender only]