Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] proposed change in first line of imgcif files

Dear Herbert and colleagues:

Let me approach the key issue of header information vs datablock contents
first. I don't think giving the CIF datablock information priority is that
big a deal.  I believe CIF reader programs could be categorised as follows:

(1) Ignore comments, read CIF datablock information only
(2) Read header comments, and read CIF datablock information
(3) Read header comments, and imgCIF frame data only

Clearly cases (1) and (2) are OK with giving the datablock information
priority, as they have access to that information anyway.  Case (3) is also
OK, as it has no way of determining that there is a mismatch, and so must
assume that the header is correct.  Such a program would include in the user
manual a line similar to the following:

"In the unlikely case that imgCIF header information does not correspond to
the imgCIF file contents, <program name> may fail to read images correctly.
A corrected header can be generated using utility <program A>, available
from <site B>."

My stance here is that we are providing a convenience header for use by
software which wants to deal with the details of imgCIF as little as
possible (e.g. frames only).  By suggesting that such a header will actually
override the CIF proper we are undermining our own standard, firstly because
the contents of a datablock can now be rendered incorrect depending on the
contents of a comment; secondly because the non-frame datablock contents can
effectively be ignored and/or not even output; thirdly because the implicit
message is that software which ignores imgCIF details as much as possible is
actually favoured by the standard.

I've added some comments below dealing with side issues.

On Wed, Sep 24, 2008 at 9:11 PM, Herbert J. Bernstein <
yaya@bernstein-plus-sons.com> wrote:

> Dear Colleagues,
>
>  We are already using a magic number to distinguish an 80 character
> CIF 1.0 CIF from a 2048 character CIF 1.1 CIF, and that is a syntactic
> difference, not "just" a semantic difference, but the real issue is
> one of semantics -- otherwise we would not be having this discussion
> at all -- we are talking qbout how applications can quickly and reliably
> extract the meaning from a CIF, and that is, for example, where mmCIF
> and pdbx CIF differ, e.g. on the handling of sheets or atom serial
> numbers.  Life would be easier for a graphics program dealing
> with both mmCIF and pdbx CIF if there was a magic number up front
> to distinguish them.  In Harry's imgCIF case, it it not just a matter
> of easier, it is a matter of making imgCIF acceptable to the
> community.


(For the sake of clarity: the 'CIF1.1' magic number comment is "recommended"
but may be omitted.)

In general, I agree that is is useful to be able to indicate file type to
the surrounding environment - Windows seems to like file extensions, Apple
used to (still does?) use 4-letter codes in one of the file 'forks', and
Unix likes magic numbers.  So I have no issue with a magic number to
indicate enough about the file contents to know how to approach it.

>
>  It is precisely on the same grounds that it is impractical to say
> the a disagreement between a magic number and internal CIF tags should
> be resolved in favor of the internal CIF tags.  We can say that all
> we want, but in terms of practical application writing, that is not
> likely to happen.  If an application is deciding on its processing
> logic from the magic number to work efficiently and it is extracting
> otherwise useful and consistent data, it is not likely to just
> give up an go away because some internal CIF tag says that it should
> have processed that data differently.


Yes, that's exactly right, it will act on the header information alone and
be happy.  Such a program is ignoring the CIF tags, so will never detect an
inconsistency, and so will always be happy.  An imgCIF file with an
incorrect header is therefore a type of file which that program will not be
able to deal with, together with lots of other obscure formats.


> It simply is not the case that a CIF with the comments included
> is sematically equivalent to the same CIF with the comments removed,
> any more than a shell script with its magic number is semantically
> equivalent to the same script with the magic number removed.  I
> expect that we could adopt a strong position on the removability
> of all comments after that, but I think we would serve the
> community best by accepting the common approach of using a
> magic number to steer processing and providing hooks to deal
> with it.


I accept that some meta-information about file type needs to be available
and have shifted my stance to something like that which you describe here:
apart from meta-information about file type in the header, no other comments
can change the semantic meaning of the file.  I am OK with having additional
header information to guide processing, as a *convenience*.

Best wishes,
James.

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://scripts.iucr.org/mailman/listinfo/imgcif-l

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.