Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: A formal specification for CIF version 1.1 (Draft)

  • Subject: RE: A formal specification for CIF version 1.1 (Draft)
  • From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
  • Date: Wed, 10 Jul 2002 22:10:01 +0100 (BST)
It is very helpful to receive such detailed comments.  I will provide
my take on what has been said, but please remember that I am speaking
only for myself.  Others involved in working on the draft may have
very different takes on these items.
  -- H. J. Bernstein
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 020
        Idle Hour Blvd, Oakdale, NY, 11769


On Wed, 10 Jul 2002, Bollinger, John Clayton wrote:

> Brian McMahon [mailto:bm@iucr.org] wrote:
> [...]
> > One point should be made carefully: this specification is for
> > an extended
> > version of CIF, not yet formally adopted by COMCIFS. The only
> > significant
> > extensions to the existing standard are: restriction of the
> > line-length
> > constraint from 80 to 2048 characters, and the introduction
> > of matching
> > square brackets as additional delimiters for string values
> > containing white
> > space.
> I think there are quite a few other differences, and no small number
> of them incompatibilities.  Many of the incompatibilities are corner
> cases, but there are some more important ones.
> Here are the differences I detected on my read through the syntax
> description:
> How about that the formally reserved but unused stop_ and save_
> keywords are now used in CIF 1.1, albeit the latter only in
> dictionaries.  And speaking of dictionaries, that they are now
> written in CIF rather than in their own STAR dialect.  (Well,
> really they're still a slightly different dialect in that only
> they can have save frames, but the draft spec says they are CIFs.)

  I believe that this use of stop_ and save_  does not invalidate
any previously valid CIFs, and is a realistic approach to dealing
with these reserved words.  Any validating CIF parser needs to have
a module to read dictionaries, where it will encounter save frames.
Any properly written CIF parser has to recognize stop_ to distinguish
it from a data value.  By making these changes in the specification,
we are specifying a common practice (save frames), and saying
that a use of a reserved word (stop_) in a context in which it
clearly is not an error, should not be treated as an error.

> And what about data values beginning with a substring matching a
> reserved word?  (Paragraph 10)  In CIF 1.0 it was reasonably clear
> that something like this applied to data_ because such a construct
> had its own semantics defined, but it was not clear that this was
> a general restriction applied to all the reserved words.  Did I
> just miss it somewhere, or is this one of those points of 1.0 that
> is being clarified via the 1.1 spec?  If the latter, then let me
> throw in that I don't like it.  I think that's because it is a
> departure from the normal sense of the term "reserved word."  In any
> case, it makes a parser that incremental bit trickier to write.

  CIF has always been presented as an application of STAR, so the
reserved words have, in fact always been reserved, and it has
always been the case the having a data value beginning data_ or
save_ was incorrect.  By applying exactly the same logic to the
full set of reserved words, I believe we should make the design of
most parsers cleaner and simpler.

> In paragraph 17: "The end-of-line associated with the closing semicolon
> does not form part of the data value."  Is this another
> change/clarification, or another published detail that had previously
> escaped me?  I had thought that that last eol was part of the value.

  If you exclude the terminal <eol> from the text field, you then allow
the semi-colon to quote arbitrary text fields, including those that
do not have a terminal semicolon.  If you do not exclude the terminal
<eol> from the text fields, then the only text that can be quoted with
semicolons is text that ends with a semicolon.

> In paragraphs 22 and 41: Exclusion of ASCII characters 11 and 12
> decimal is a departure from and incompatibility with CIF 1.0.  Not
> that I particularly object -- handling these appropriately is a pain.

  The second  sentence of the abstract of the Hall, Allen, Brown paper

  "The CIF is a general, flexible and easily extensible free-format
  archive file; it is human and machine readable and can be edited by a
  simple text editor."

It is not always possible to edit texts containing ASCII control
characters other than HT with a "simple text editor".  VT and FF
serve to useful purpose in a CIF, and, as you note, they can
be a pain to handle.

> In paragraph 29: the data name length restriction to 75 characters is
> another incompatibility with CIF 1.0 (as revised) where the data name
> length was restricted only indirectly by the line length restriction.
> Thus in CIF 1.0 data names could be 80 characters long.

Actually, to allow a data name to be defined in a dictionary you have
to allow it to appear with a prepended "data_" or "save_".  In DDL1
dictionaries, the leading underscore of the data name is dropped, which
has created a limit of 76 characters.  In DDL2 the underscore is
retained, which has create a limit of 75 characters.  Thus the 75
character limit is simply a recognition of the implicit line
length restrictions that had been in effect in the past, and helps
to ensure that old systems will be able to work with these new names.

> Paragraph 42 makes it optional to support line termination semantics
> different from the host OS'.  That would be another departure
> from CIF 1.0, I think, and, in my opinion, an all-around bad idea if
> CIFs are supposed to be portable.  As far as I can tell, the pseudo-
> production presented for <eol> is in fact the required implementation
> for a fully-conformant CIF 1.0 parser.

If you are on a unix system, the pseudo-production is almost right
for a "liberal-reader" CIF parser.  It misses the case of a final
line in a file which has not been terminated by "\n".  If you are
on a VMS system, or an IBM mainframe, the pseudo-production may be
completely wrong for a CIF created locally as a text file.  If CIFs
are truly to be portable, it must be possible for someone on
a non-Unix system (and non-Windows, non-Mac system) to work with them.

> Paragraph 43: In combination with the formal grammar presented earlier,
> the definitions of the <eol> and <noteol> non-terminals in fact seems
> to _preclude_ CIF parsers from handling non-native line termination
> semantics.  Even if that's not a departure from CIF 1.0, it's still
> a bad idea.

We are not trying to preclude people from writing parsers which are
liberal and able to read a wider range of CIF formats than those
produced by the text editors of their own machines, but it would
be unreasonable and impractical to insist that every parser be able
to read every line format that ever has or will be invented.  It
is not even reasonable to insist that every parser be able to
read some short list of non-native line formats.  That would,
for example, make Fortran-implemented parsers non-conformant on
certain systems.

> According to paragraph 60, a file containing only whitespace and
> comments but no data block is not a valid 1.1 CIF.  That is another
> departure from CIF 1.0 if it is really the intent.  One of the ciftest
> trip files actually tests this case, in fact.

  This sounds like a good topic for further discussion.  I for one
would favor allowing such a file to be a CIF, but I am not certain
what I would do with it.

> Paragraph 61: this is another departure from CIF 1.0, which did allow
> data blocks without data items.  Another of the ciftest trip files
> tests this case.  (vcif evidently produces a warning, which seems
> reasonable, but this is not an error.)

  Yet another good topic for discussion.
> Regards,
> John Bollinger
> jobollin@indiana.edu

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.