Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Forwarded mail

  • To: Multiple recipients of list <imgcif-l@bnl.gov>
  • Subject: Forwarded mail
  • From: Andy Hammersley <hammersl@esrf.fr>
  • Date: Mon, 24 Jun 1996 06:03:09 -0400 (EDT)

I'm forwarding this from Brian McMahon.

------------------------------------------------------------------------------

>From bm@iucr.ac.uk Fri Jun 21 17:22 MET 1996
Received: from esrf.esrf.fr by expga.esrf.fr with SMTP
	(1.38.193.4/16.2) id AA25423; Fri, 21 Jun 1996 17:22:31 +0200
Return-Path: <bm@iucr.ac.uk>
Received: from out.esrf.fr (out.esrf.fr [192.168.100.99]) by esrf.esrf.fr (8.6.10/8.6.9) with ESMTP id RAA24054 for <hammersl@esrf.fr>; Fri, 21 Jun 1996 17:17:22 +0200
From: bm@iucr.ac.uk
Received: (from uucp@localhost) by out.esrf.fr (8.6.10/8.6.10) id RAA27702 for <hammersl@esrf.fr>; Fri, 21 Jun 1996 17:23:41 +0200
Received: from agate.iucr.ac.uk(192.70.242.60) by firewall via smap (V1.3)
	id tmp027607; Fri Jun 21 17:23:08 1996
Received: by agate.iucr.ac.uk (5.x/SMI-SVR4-BM)
	id AA10704; Fri, 21 Jun 1996 16:19:52 +0100
Date: Fri, 21 Jun 1996 16:19:52 +0100
Message-Id: <9606211519.AA10704@agate.iucr.ac.uk>
To: hammersl@esrf.fr
Subject: standards
Status: RO

Hello Andy

Strong feelings appear to be stirring in the breasts of some recent
contributors to the imgcif-l list! I append below a discussion of how I
see the question of adherence (at some level) to the IUCr standard for 
information interchange. While this shouldn't be read as a definitive
pronouncement, you can consider it a reasonable indication of the current
thinking of COMCIFS. You will see that I envisage the possibility of both an
image CIF and a CBF coexisting - an option that you already proffered in
your draft proposal. Formally, we would consider the CIF representation as
paramount - that is what COMCIFS would approve. In practice, your community
might use the CBF as a working file format, so long as at any time the
"official" CIF representation can be extracted. The CBF is to my mind a
useful "bundling" format for archiving a set of files of different
character, and it would be a useful development for areas outside of your
immediate interests. The problems we are tackling here are not unique - one
need only think of email messages with "attachments"; WWW documents with
html pages and links to graphics, audio or movie files; or SGML documents,
as I mention below. All these applications benefit from being amenable to
treatment either as a collection of separate files, or as a compound entity.

Given the effort you have put into building a header structure that is CIF
compliant, I hope that you can persuade your working group to adhere to the
IUCr standards. In any case, I hope no irrecoverable decisions are made
before we all have a chance to get together and work through some of the
issues over a beer at Seattle.

I hope the notes below are found to be constructive; they're certainly
intended to be. I'm happy for you to circulate them on the list, if you
think that's appropriate.

Cheers
Brian
_______________________________________________________________________________
Brian McMahon                                             tel: +44 1244 342878
Research and Development Officer                          fax: +44 1244 314888
International Union of Crystallography                  e-mail:  bm@iucr.ac.uk
5 Abbey Square, Chester CH1 2HU, England
_______________________________________________________________________________

Some thoughts on the direction of the proposed image file standard

1. The IUCr has a standard for crystallographic information interchange.

First, ask the question why have a standard? Clearly, to facilitate
communication between different users. A "standard" might be a working
agreement between two individuals; it may be an agreed convention within
a small community; or it may be something which is understood and acted
upon across a wide spectrum of related communities. The IUCr has tried
to establish a standard that will accommodate the needs of
small-molecule crystallographers, protein structural scientists, powder
diffractionists, symmetry theorists - ideally, anyone who wishes to use
crystallographic data and information. Inevitably, some compromises need
to be made along the way: what's optimal for one subdiscipline may be an
inefficient way of handling the data generated for another.

But the approach has generated sufficient interest from
non-crystallographic groups (protein NMR spectroscopists, quantum
chemists, chemical modellers) that it suggests the possibility of
establishing an inter-disciplinary scientific exchange protocol, and
that's a worthwhile goal. (Consider human languages - English lacks the
poetic expressiveness of Gaelic, or the ability to differentiate between
the many different forms of ice of the Inuit tongue; but it's
sufficiently adaptable to be the universal exchange language of science.)

2. The IUCr standard is CIF/STAR.

Hence anything that seeks to be part of the IUCr standard must conform
to the STAR syntax (as published by Hall and Spadaccini) and the CIF
constraints (as ordained by COMCIFS). That is basically an ASCII text
format, chosen for good reasons as a robust, lowest-common-denominator
style of representation. Because it's based on a textual model, it's
good for description, poor (at least!) at numeric data storage. Its
strength is really in archiving the aspects of the crystallographic
structure that are conducive to direct human interpretation - the
derived information, rather than the raw data.

There is, however, no reason why raw data should not be referred to, and
pointers supplied for the appropriate use of the data, from within a CIF. 

It's interesting that the current standard for publishing, SGML, was
developed completely independently, yet has also arrived at an ASCII
representation with reference ("entity") pointers to allow links to
binary data files. The common elements underlying both SGML and CIF are
the requirements for portability, updatability, and for the structured
handling of arbitrarily tagged data structures. Both SGML and CIF use
external dictionary files to describe data structures and what to do
with them.

The entity reference (or external file pointer) techniques allow the
associated binary data files to be optimal for their intended purpose.
SGML says nothing about how to digitise sound; CIF does not address the
optimal runlength encoding strategy for image data. But SGML does permit
an area-detector data set to be included in a book on art, and CIF does,
in principle, permit audio annotations of an experimental data set.
Perhaps we see the need for neither of these applications; but perhaps
the addition of multimedia annotations to the crystallographic
experimental log is not without possible benefit.

How does CBF propose to include an audio annotation of an included
data set :-)  ?


3. The proposed CBF is not CIF/STAR compliant.

It's not an ASCII file representation. It does not follow the
<tag>-<value> paradigm throughout. 

Therefore the CBF cannot be part of the IUCr standard for
crystallographic information interchange.


4. It can be made so.

Make the CBF "header" section a pure text file. Include a dataname that
acts as a file pointer to the associated binary data file. Then you can
have a fully compliant addition to the CIF stable. The data structures
in the header section are already described in CIF terms; the dictionary
is fully CIF compliant.


5. Is there anything to be gained by doing so?

This is the crucial question. To my mind there is one thing to be lost -
the tight coupling of data and the information that describes it. An
isolated binary image file is useless - you won't know how to decode it;
even if you crack the byte-order, encoding algorithm, dynamic range and
array size problems and succeed in reproducing an image, you have no way
of knowing what it is an image of. The perils are non-trivial.

On the other hand, if there are real and valuable reasons for linking
the binary data to other crystallographic applications, or of linking
other crystallographic information to the image files, perhaps there is
a case to be made for making the leap to CIF. Andy suggests one reason
in his example header - the experimental details are extracted from the
core CIF dictionary. I could envisage a future application where an
mmCIF data file includes a pointer to a synchrotron image to illustrate
some weak features supporting a particular structural interpretation.

There is another potential benefit, though this is one for the longer
term. People working on CIF are considering the possibility of building
into the dictionaries machine-readable descriptions of the algorithmic
handling of the data described. Hence the textual description of the
different scalings listed in data_image_intensities_linearity would be
replaced by a set of machine-readable declarations that would allow a
general parser to read the raw data and automatically apply the required
transformations. In like manner, you could add a new compression
algorithm by simply throwing in the machine declaration of the algorithm
to the dictionary - your existing software would be clever enough to be
able to parse this and decode the new compression scheme. As I say, this
has elements of the visionary; but is more likely to be brought to
fruition if we have a larger group of programming specialists working
within a unified framework.


6. If so, how to insure against the decapitated file?

I should like to see the image description file become part of the CIF
standard, because I do see it as one more element in a complete
description of crystallographic information. But I do appreciate the
danger of losing the header information. Hence I have no problem with
the development of a file archive protocol that ensures the coherent
transfer of both elements of the data. In many respects, the Unix "tar"
or DOS "zip" commands do the trick well enough - wrap a number of files, text
or binary, into one, and retain the directory relationships between them.

But these utilities aren't portable; so now we come full circle, and suggest
that Andy's CBF proposal be seen as fulfilling this role. It will be a
"bundled" file that contains both header and data sections. Its internal
constitution is of no concern to COMCIFS. However, if point (5) has been
agreed, and the community is pursuing an image description file format
that is part of the IUCr standard, the CBF must be accompanied by
software that reliably disassembles it into a fully compatible CIF and
its associated binary data files. That's already been accepted by Andy
as part of his overall proposal.

Note that you are then free, say, to include multiple runs of binary
data defined by length.


7. Summary

In summary, I see (5) as the crucial question that the image description
group needs to address. Will there be a real need to interoperate with
other aspects of the crystallographic process - experimental logs,
reduced data sets and structure factors, refinement histories, structural
model descriptions, reports and deposited or published papers - in a way
that makes conformance to the CIF standard desirable?

If the answer is "no", then COMCIFS need play no further formal role in
this discussion, and you are free to develop your standard in the way
you consider most appropriate. But you may not refer to your format as
"CIF" or "STAR", nor should you imply that it is so. It might be
considered prudent to come to an understanding with COMCIFS about
modifying the syntax of the header section to reduce the similarities to
CIF/STAR.

If the answer is "yes", then you should formally develop the image
dictionary and description files as fully compliant CIFs, with pointers
to associated data files. If you wish to develop the archival "CBF"
format for practical use, you must ensure that it will "unwrap" to a
set of files fully compliant with all CIF rules.

_______________________________________________________________________________

Brian McMahon                           Coordinating Secretary, COMCIFS





Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.