[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Imgcif-l] Adding references to external files to imgCIF
- To: The Crystallographic Binary File and its imgCIF application to image data<imgcif-l@iucr.org>
- Subject: Re: [Imgcif-l] Adding references to external files to imgCIF
- From: James Hester <jamesrhester@gmail.com>
- Date: Tue, 5 Mar 2019 16:38:17 +1100
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:references:in-reply-to:reply-to:from:date:message-id:subject:to; bh=uX6tLurXNtniv+LmLYc/OwTukRc8xKjfSUDewFXxMRw=;b=uwfMGVdxOCoD/WwFtl5tX+4OFN0XUwzSr+Rj0HFp4yDghpu9BZI0ZLsOwxmybmKhGqq3AteaR/z7+kfr96eOuapMeGpe1GYqJmnyiUouYda/B9meMx1Ch+G6KSKKt4EN0yktj1eg4fb5yHlxI1K3A1oYNdFPC50VXQDtjfs5sjkBQQ8Im4oUy0lpRRAEmRFybxSM9XReGEzGY+D0zRW2H4Ssrtq2f0wPVOpbtttvOfib/G0y54boadpu2LAHM204TfDhRhgi3OjfJ8oroz/iRDifU472BF0p6GQTlJFiIFGM9KYUmNzQLMJtdk2gUQtS2PnkNy9oP0MUNz3px5dtPQ==
- In-Reply-To: <b89f6eac-0ea2-6e8d-4dd8-2f0faede7155@esrf.fr>
- References: <CAM+dB2dGcbLy3NuMy1g=QvWP3Mhj09F1WksKRXJ1BHeZ9_fXyw@mail.gmail.com><FDBF95B6-0C0A-48A1-92B4-9B567AD5C9E5@diamond.ac.uk><CAM+dB2c9qOZg8D151WwoJYkM_YtR-+kKcFvNaLNk4cM=3vEoQQ@mail.gmail.com><45770c29-9aea-5d27-fdb9-a4b2f57f8218@esrf.fr><CAM+dB2c4DkCeh4g7M3nV5jwp96UM819-dj1eDnYusFHmODBS5w@mail.gmail.com><b89f6eac-0ea2-6e8d-4dd8-2f0faede7155@esrf.fr>
Adding at least one check item seems like a fine idea, I will add it as an issue on imgCIF so we don't lose sight of it. On Thu, 14 Feb 2019 at 20:01, Jonathan WRIGHT <wright@esrf.fr> wrote: > Dear James > > This sounds like a good and pragmatic way to do it. A remaining thought > is whether to offer an optional way to validate the data found at the > other end of a link. Was it over-written, corrupted, does the reading > library have a bug, etc ? > > In practice we often record 1D counters from images, like mean, stddev, > min, max etc, either for the whole image or a few selected roi. Writing > an optional column with something similar might be useful for quickly > locating interesting frames as well as acting as checksum data. Is there > already something like these ? > > array_data.data_sum > array_data.data_mean > array_data.data_min > array_data.data_max > array_data.data_stddev > > ( Any ROI would be in a different array_id ) > > Perhaps there is already a mechanism and I am just not aware of it? The > same thing would be useful for NeXus too, but I did not manage to locate > it. > > Of course, there is no need to add this now if it adds too many > complications for what you actually need. > > Best, > > Jon > > > > On 14/02/2019 04:36, James Hester wrote: > > Dear Jon, > > > > The answer to your questions as I see it is that the imgCIF > specifications > > work at a semantic level. So, if the NeXus specifications say that a > > multidimensional array of numbers is found at a particular location, that > > is all that imgCIF cares about. The way in which those numbers are stored > > is not relevant. Tools that actually want to use the linking information > > within imgCIF would have to know, for each external_format, how to access > > items at a given location. This is not an obstacle in practice due to > the > > many libraries available that do just this. > > > > There is another missing data name in _array_data: a dataname to hold the > > actual data as an array of Real numbers. This array is implicitly > > available as the result of processing the image data, but it would be > most > > convenient for eventually writing dREL methods to manipulate images for > > this to a be defined as a separate dataname. So if it helps to see what > > I'm saying, imagine that the task is to calculate values of this dataname > > either from data frames within the imgCIF, or from an external source. > If > > an external source is chosen, then it delivers an array of numbers > directly > > into the dataname as long as the location of the external array is > > unambiguously specified. > > > > Alternatively, we could specify that the information delivered from the > > external format is processed according to the array_structure and > > array_structure_list loops. If both loops are empty, an array of Reals > is > > returned from the external format; if array_structure has some > information, > > that is used to process a byte string returned from the tool. So in the > > case of ADSC, an imgCIF file could specify both of array_structure and > > array_structure_list if the specification of "ADSC" format states that a > > string of bytes is returned. Or an alternative "ADSC-processed" format > > could be specified that returns an array of integers. > > > > I've inserted some answers below. > > > > On Wed, 13 Feb 2019 at 21:37, Jonathan WRIGHT <wright@esrf.fr> wrote: > > > >> Dear James, > >> > >> An index could be very useful but there seem to be some practical > >> problems to overcome: > >> > >> - How should this handle different frame formats? > >> > > > > If you mean things like compression and array layout, then that is dealt > > with outside of imgCIF by tools that wish to make use of the linking > > information. > > > > - Does it pull in detailed external_format binary descriptions? > >> > > > > No > > > > > >> - What if frames are using proprietary compression? > >> > > > > See above - that is not imgCIF's concern > > > >> > >> With HDF the it seems the library accepts binary data described by a set > >> of "External Storage Properties" and takes care of reading this data > >> too. As it only arrived in h5py in the last release (2.9, Dec 2018) it > >> is something I look forwards to trying out soon. So far I do not know > >> if you can have compression and things like ascii overflow tables. If > >> anyone can share examples it would help me to learn to use it. > >> > >> Not sure this message helps... if you leave out the need to read the > >> data then everything would be simplified, but then what is the index > >> going to be used for ? > >> > > > > Well, data use is indeed optional. imgCIF is only providing sufficient > > information to allow unambiguous access to the data for those who wish to > > do this. I expect that in many cases users will stick to their usual > tools. > > imgCIF descriptions would only become useful in situations where you are > > processing data from an unfamiliar archive. > > > > > >> > >> All the best, > >> > >> Jon > >> > >> > >> > >> PS: and many thanks for pycifrw ! > >> > > > > Thanks for the feedback, I don't often hear about where it is being used > > unless there's a problem. > > > >> > >> > >> On 13/02/2019 10:01, James Hester wrote: > >>> Dear Graeme, > >>> > >>> The context of this is the idea that a single imgCIF file could be > >>> generated from a collection of raw image files (in whatever format, > >> whether > >>> HDF5, or ADSC, or Bruker, or Rigaku, etc.) which would contain the > >> metadata > >>> pertaining to that collection. In such a situation, some way of > referring > >>> to the raw frames from within the imgCIF file is required. > >>> > >>> I agree that a perfectly reasonable approach is not to generate any new > >>> file at all, and simply to access the metadata directly in whatever > >> format > >>> happens to be there. This was my initial impulse as well and it took > me a > >>> while to understand that the actual proposal was to create an imgCIF > >> file, > >>> rather than just use imgCIF datanames for specification purposes. > From a > >>> semantic point of view both amount to the same thing so my only real > >>> motivation here is to add an image linking facility to imgCIF so that > the > >>> "generate a summary metadata file" approach is possible. > >>> > >>> Could we just copy the HDF5 way of referring to objects in other HDF5 > >> files > >>> as a quick solution? > >>> > >>> all the best, > >>> James. > >>> > >>> On Wed, 13 Feb 2019 at 19:03, Graeme.Winter@Diamond.ac.uk < > >>> Graeme.Winter@diamond.ac.uk> wrote: > >>> > >>>> Dear James, > >>>> > >>>> On the face of it, this looks a lot to me like a reinvention of HDF5 - > >>>> perhaps with specific semantics - and there is already a (complete?) > >>>> mapping from imgCIF to HDF5 / NeXus > >>>> > >>>> Have I missed something? No offence meant, trying to understand the > >> shape > >>>> of the problem you are trying to solve > >>>> > >>>> Thanks & best wishes Graeme > >>>> > >>>>> On 13 Feb 2019, at 05:15, James Hester <jamesrhester@gmail.com> > wrote: > >>>>> > >>>>> Dear All, > >>>>> > >>>>> Recent Commdat discussion revealed a desire to reference external > >> images > >>>>> from within an imgCIF file. This would allow the metadata for a > dataset > >>>> to > >>>>> be held within a single imgCIF file, while the frames themselves > remain > >>>>> separate. This avoids the impracticality of navigating through an > >>>> enormous > >>>>> mulit-frame imgCIF file in order to extract a relatively compact > amount > >>>> of > >>>>> information. > >>>>> > >>>>> As a starting proposal, I suggest we extend the _array_data category > >> with > >>>>> the following three datanames: > >>>>> > >>>>> (1) _array_data.external_format A value drawn from an enumerated > >> list > >>>> of > >>>>> formats (e.g. "SMV","HDF5","Bruker"). The definition for each > >> enumerated > >>>>> value would explain how to interpret _array_data.internal_path > >>>>> (2) _array_data.location_url A URI for the file containing > >> the > >>>>> image. A relative URL is relative to the location of the imgCIF file > >>>>> (3) _array_data.internal_path A format-specific string > >> describing > >>>>> the location of the frame within the file identified by > >>>>> _array_data.location_uri, interpreted according to the value given in > >>>>> _array_data.external_format > >>>>> > >>>>> So for a multi-frame HDF5 file buried in a subdirectory of the > location > >>>>> referenced with a DOI, with appropriate definitions of the path > >> notation: > >>>>> > >>>>> loop_ > >>>>> _array_data.array_id > >>>>> _array_data.binary_id > >>>>> _array_data.external_format > >>>>> _array_data.location_uri > >>>>> _array_data.internal_path > >>>>> 1 1 NXMX doi:x.y.z > >> directory/run/masterfilename:/entry1/detector/data[0] > >>>>> 1 2 NXMX doi:x.y.z > >> directory/run/masterfilename:/entry1/detector/data[1] > >>>>> ... > >>>>> > >>>>> Or for a bunch of single-frame files generated by an ADSC detector in > >> the > >>>>> same directory as the imgCIF file > >>>>> > >>>>> _array_data.array_id > >>>>> _array_data.binary_id > >>>>> _array_data.external_format > >>>>> _array_data.location_uri > >>>>> 1 1 ADSC ./tartaric.001 > >>>>> 1 2 ADSC ./tartaric.002 > >>>>> 1 3 ADSC ./tartaric.003 > >>>>> ... > >>>>> > >>>>> The imgCIF data items describing the structure of the data array > would > >>>>> refer to the data after it has been provided by the format. The form > in > >>>>> which it is provided should be specified in the definition of each > >> value > >>>> of > >>>>> "_array_data.external_format". So, for example, the various > >> compression > >>>>> methods in HDF5 would be invisible if the data as returned are > >> specified > >>>> to > >>>>> be an array of Reals. > >>>>> > >>>>> From the point of view of initial data validation, it would be > >> sufficient > >>>>> to check that all referenced files are accessible, and that the > >> provided > >>>>> locations exist. > >>>>> > >>>>> Thoughts? > >>>>> James. > >>>>> > >>>>> -- > >>>>> T +61 (02) 9717 9907 > >>>>> F +61 (02) 9717 3145 > >>>>> M +61 (04) 0249 4148 > >>>>> _______________________________________________ > >>>>> imgcif-l mailing list > >>>>> imgcif-l@iucr.org > >>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l > >>>> > >>>> > >>>> -- > >>>> This e-mail and any attachments may contain confidential, copyright > and > >> or > >>>> privileged material, and are for the use of the intended addressee > >> only. If > >>>> you are not the intended addressee or an authorised recipient of the > >>>> addressee please notify us of receipt by returning the e-mail and do > not > >>>> use, copy, retain, distribute or disclose the information in or > >> attached to > >>>> the e-mail. > >>>> Any opinions expressed within this e-mail are those of the individual > >> and > >>>> not necessarily of Diamond Light Source Ltd. > >>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any > >>>> attachments are free from viruses and we cannot accept liability for > any > >>>> damage which you may sustain as a result of software viruses which may > >> be > >>>> transmitted in or with the message. > >>>> Diamond Light Source Limited (company no. 4375679). Registered in > >> England > >>>> and Wales with its registered office at Diamond House, Harwell Science > >> and > >>>> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > >>>> > >>>> > >>> > >> > >> _______________________________________________ > >> imgcif-l mailing list > >> imgcif-l@iucr.org > >> http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l > >> > > > > > > _______________________________________________ > imgcif-l mailing list > imgcif-l@iucr.org > http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ imgcif-l mailing list imgcif-l@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/imgcif-l
Reply to: [list | sender only]
- References:
- [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (Graeme.Winter@Diamond.ac.uk)
- Re: [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (Jonathan WRIGHT)
- Re: [Imgcif-l] Adding references to external files to imgCIF (James Hester)
- Re: [Imgcif-l] Adding references to external files to imgCIF (Jonathan WRIGHT)
- Prev by Date: Re: [Imgcif-l] Adding references to external files to imgCIF
- Next by Date: Re: [Imgcif-l] Adding references to external files to imgCIF
- Prev by thread: Re: [Imgcif-l] Adding references to external files to imgCIF
- Next by thread: Re: [Imgcif-l] Adding references to external files to imgCIF
- Index(es):