Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF or NOT CIF, (or nothing)

Thanks to Andy for his response to my request for a discussion of the goals
of the image standardisation project.

> The aim of ("imageNCIF") is to "standardize" the passing of image (and other) 
> crystallographic experimental data (1) from: one institute to another;
> one make of computer system to another; and from one computer program 
> (acquisition or analysis) to another (2). If a sufficiently large number
> of institutes/ programmers/ and producers of detector equipment can
> agree on a common format then the present task of having to support 
> numerous new and different image (data) formats will be at least 
> lessened (3).

In this respect the objectives match those of the IUCr CIF project. The
reasons CIF chose an ASCII representation were to ensure interplatform
compatibility and also ease of transmission by email. Further, string
representations of numbers removes dependency on the data typing conventions
of any particular platform or implementation (is the data an unsigned long
integer? double-precision complex? - the number can be expressed to
arbitrary precision as a string). (It has to be said that not all our
colleagues consider this last property of CIF to be a good thing!)

At least as important (probably more so, in the longer term) is the archival
nature of the data representation. Although we are working on making CIF
data files and dictionaries more machine-readable, the idea is that one
should still be able to retrieve most of the information in a CIF from a
simple listing of the file (even a hardcopy printout, if that is all that is
available) into the indefinite future. One can grow neurotic in the pursuit
of this ideal - can we even guarantee that the ASCII convention will
be understood in 10 years time? However, as Peter Keller is always ready to
remind us, there are dangers in too ready an assumption that our current
systems represent unvarying perfection into the indefinite future.
 
It's largely for this archival reason that I would be happiest to have an
embedded ASCII-encoded data field in a single file. (The CIF header and
associated binary file has its own attractions, but there's every risk of
losing one or other of the files from arbitrary systems in the future.) Yes,
the storage overhead is a confounded nuisance; but the archival file can
always be translated into an optimal binary working format for local use.

One other point is worth making here. Suppose you have a CIF containing a
base64-encoded ASCII stream of QEncoded miff data (Peter's example), perhaps
also gzip'd somewhere along the line for good measure. In what sense is this
"self-defining"? One of the techniques we're beginning to look at in CIF is
to include "methods" in the CIF dictionaries. This would allow the decoding
and uncompressing algorithms to be applied to a particular field to be
stored in the dictionary entry for that field (for example, as tcl or C++
procedures). Then you would be free to use arbitrary encoding schemes
(provided always that you had access to the dictionary).

Of course, the final encoding would still have to use the ASCII character
set :-)

> 3: CIF based "graphics" description of binary data
>>
>> 3) I do not really know what you mean - sorry.

Let me amplify somewhat the ideas behind this proposal. At present, it's a
very tenuous proposal, and may never get anywhere. But perhaps this is a
good place to give it some thought and see whether it's a fruitful idea to
develop. At the present time, it certainly won't replace the conventional
way of storing images; but might it grow to become a better way of
describing at least some types of data?

The IUCr Computing Commission is beginning to undertake a study into the
relative merits of different graphics formats for conveying information.
The initial concerns are with representing in a more 'object-oriented' way
the types of graphical information routinely published in Acta. For instance,
a raster image of an ORTEP plot conveys no machine-parsable information;
only the output picture has any meaning to a human. A PostScript description
of the same plot as a large collection of straight-line segments may be more
compact than a rastered representation (or not!), but is hardly more
informative. A PostScript description which defines ellipses of certain axial
ratios and orientations, together with a stacking order, is somewhat more
meaningful. A PostScript program which defined 3-D ellipsoids and calculated
their projections onto a given plane with a specified lighting angle would
give significantly more information still. And a 'metagraphics' representation
which described the object to be visualised in a way that could easily be
translated into any graphics format, while still providing handles for
other software to manipulate the objects, is the desirable extrapolation of
all these ideas.

Now the question is whether this approach can be extended to images
representing data collection. Consider a Laue pattern; some description of
the centroid positions of each spot, together with an analytic description
of the intensity pattern around that centroid, could supply more information
than a simple pixel-by-pixel rasterisation. If one were able to work out
such representations for at least some classes of image data, would it not
make sense to do this in parallel with the more straightforward
object-oriented metagraphics of the preceding paragraph?

The obvious objection to this proposal in the Laue example is that such a
description is model dependent. Is that a killer objection? Are there instances
when you can compactly describe the intensity distribution about a point as a
superposition of spherical trigonometric functions to a precision better
than the pixel size? Would such a description still have an interpretative
bias? Is it actually any more informative than a rectangular pixelization?

How do astronomers archive star-field data from photographic plates? from
CCDs?

Obviously these considerations are largely off the point of this discussion
list, but perhaps they are worth having at the back of our minds. And if
anyone is interested in following them further, I can put them in touch with
the Computing Commission :-)

Seasonal good wishes to all.

Brian
_______________________________________________________________________________
Brian McMahon                                             tel: +44 1244 342878
Research and Development Officer                          fax: +44 1244 314888
International Union of Crystallography                  e-mail:  bm@iucr.ac.uk
5 Abbey Square, Chester CH1 2HU, England

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.