Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Imgcif-l] proposed change in first line of imgcif files

Dear James,

   We are _not_ providing just a convenience header.  We are providing
critical information that major software packages will count on to
correctly parse certain CIF files.  No matter what we say, that magic
number will override any conflicting information deeper into the file
for those packages.  In an ideal world, the information deeper in
will not conflict, but when there is a conflict, it is highly unlikely
that processing will be stopped if the resulting data makes any sense
at all to the processing software.

   We have a better chance of reducing further growth of conflicting
dialects by being realistic about what people will do than by
prescribing rules they are not likely to follow.  We would be
doing a disservice to users if we tell them they can override a
magic number by changing a tag further into the CIF when it is not
likely to work.

   Once we agree on the relevant tags, I will add the necessary checks
to vcif2, so that people who care can check for such conflicts, and
give cif2cbf the ability to regenerate files with the conflicts
resolved, but I think we need to accept the reality that we have
no real control over what people will do in the field, especially if
what we specify conflicts with commonly accepted practice.  A standard
will only be accepted by this community if it grows organically from
within the community, and we have had a strong, consistent and reasonable
request for clear specification of the necessary parse in the first
line of imgCIF files.

   I think it would be fine to move CIF in the direction of really being
a standard.  The ISO rules are given at

http://www.iso.org/iso/standards_development/processes_and_procedures/how_are_standards_developed.htm

ANSI has similar guidelines.  The standards process in time-consuming and
involves a great deal of consensus building.  I think it would be worth
trying to do.

   Regards,
     Herbert






At 10:21 AM +1000 10/1/08, James Hester wrote:
>Dear Herbert and colleagues:
>
>Let me approach the key issue of header information vs datablock 
>contents first. I don't think giving the CIF datablock information 
>priority is that big a deal.  I believe CIF reader programs could be 
>categorised as follows:
>
>(1) Ignore comments, read CIF datablock information only
>(2) Read header comments, and read CIF datablock information
>(3) Read header comments, and imgCIF frame data only
>
>Clearly cases (1) and (2) are OK with giving the datablock 
>information priority, as they have access to that information 
>anyway.  Case (3) is also OK, as it has no way of determining that 
>there is a mismatch, and so must assume that the header is correct. 
>Such a program would include in the user manual a line similar to 
>the following:
>
>"In the unlikely case that imgCIF header information does not 
>correspond to the imgCIF file contents, <program name> may fail to 
>read images correctly.  A corrected header can be generated using 
>utility <program A>, available from <site B>."
>
>My stance here is that we are providing a convenience header for use 
>by software which wants to deal with the details of imgCIF as little 
>as possible (e.g. frames only).  By suggesting that such a header 
>will actually override the CIF proper we are undermining our own 
>standard, firstly because the contents of a datablock can now be 
>rendered incorrect depending on the contents of a comment; secondly 
>because the non-frame datablock contents can effectively be ignored 
>and/or not even output; thirdly because the implicit message is that 
>software which ignores imgCIF details as much as possible is 
>actually favoured by the standard.
>
>I've added some comments below dealing with side issues.
>
>On Wed, Sep 24, 2008 at 9:11 PM, Herbert J. Bernstein 
><<mailto:yaya@bernstein-plus-sons.com>yaya@bernstein-plus-sons.com> 
>wrote:
>
>Dear Colleagues,
>
>  We are already using a magic number to distinguish an 80 character
>CIF 1.0 CIF from a 2048 character CIF 1.1 CIF, and that is a syntactic
>difference, not "just" a semantic difference, but the real issue is
>one of semantics -- otherwise we would not be having this discussion
>at all -- we are talking qbout how applications can quickly and reliably
>extract the meaning from a CIF, and that is, for example, where mmCIF
>and pdbx CIF differ, e.g. on the handling of sheets or atom serial
>numbers.  Life would be easier for a graphics program dealing
>with both mmCIF and pdbx CIF if there was a magic number up front
>to distinguish them.  In Harry's imgCIF case, it it not just a matter
>of easier, it is a matter of making imgCIF acceptable to the
>community.
>
>
>(For the sake of clarity: the 'CIF1.1' magic number comment is 
>"recommended" but may be omitted.)  
>
>In general, I agree that is is useful to be able to indicate file 
>type to the surrounding environment - Windows seems to like file 
>extensions, Apple used to (still does?) use 4-letter codes in one of 
>the file 'forks', and Unix likes magic numbers.  So I have no issue 
>with a magic number to indicate enough about the file contents to 
>know how to approach it.
>
>
>  It is precisely on the same grounds that it is impractical to say
>the a disagreement between a magic number and internal CIF tags should
>be resolved in favor of the internal CIF tags.  We can say that all
>we want, but in terms of practical application writing, that is not
>likely to happen.  If an application is deciding on its processing
>logic from the magic number to work efficiently and it is extracting
>otherwise useful and consistent data, it is not likely to just
>give up an go away because some internal CIF tag says that it should
>have processed that data differently.
>
>
>Yes, that's exactly right, it will act on the header information 
>alone and be happy.  Such a program is ignoring the CIF tags, so 
>will never detect an inconsistency, and so will always be happy.  An 
>imgCIF file with an incorrect header is therefore a type of file 
>which that program will not be able to deal with, together with lots 
>of other obscure formats.
>
>
>It simply is not the case that a CIF with the comments included
>is sematically equivalent to the same CIF with the comments removed,
>any more than a shell script with its magic number is semantically
>equivalent to the same script with the magic number removed.  I
>expect that we could adopt a strong position on the removability
>of all comments after that, but I think we would serve the
>community best by accepting the common approach of using a
>magic number to steer processing and providing hooks to deal
>with it.
>
>
>I accept that some meta-information about file type needs to be 
>available and have shifted my stance to something like that which 
>you describe here: apart from meta-information about file type in 
>the header, no other comments can change the semantic meaning of the 
>file.  I am OK with having additional header information to guide 
>processing, as a *convenience*.
>
>
>Best wishes,
>James.
>
>
>--
>T +61 (02) 9717 9907
>F +61 (02) 9717 3145
>M +61 (04) 0249 4148


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
imgcif-l mailing list
imgcif-l@iucr.org
http://scripts.iucr.org/mailman/listinfo/imgcif-l

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.