Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Revised draft of CIF 1.1 syntax document

  • Subject: RE: Revised draft of CIF 1.1 syntax document
  • From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
  • Date: Mon, 16 Sep 2002 17:50:46 +0100 (BST)

> (1) Is a useful purpose served by permitting the use of the 
> STAR word stop_ to indicate the end of a loop or loop header? 
> Since CIF does not use nested loops, its use for this purpose 
> is unnecessary. On the one hand it permits the general usage 
> of the STAR stop_ directive in CIFs and (arguably) is of help 
> in data recovery from broken CIFs; on the other hand it 
> requires parsers to accommodate a new directive and track the 
> context accordingly.

stop_ is of no help in recovering broken CIFs if it is not used, and if
it is not required then there is no reason to expect that many broken
CIFs will use it.  This would change if some widely-used CIF generating
software started putting in stop_ automatically, of course.  I think the
issue of recovering broken CIFs is at best a wash, however, because
introducing syntactic significance to stop_ also introduces new ways to
break CIFs and new modes of ambiguity in broken CIFs.

In a well-formed CIF the additional semantic content of a stop_ is
absolutely zero, and in an invalid CIF the semantic content is
(necessarily) undefined.  Introducing it as an option requires compliant
parsers to support it, for little or no additional value.  I still fail
to see why adding it is even being considered.
 
> (2) Should the value of a semicolon-delimited text field 
> include the final end-of-line? If so, the following two cases 
> have identical values: 'foo' and ;foo ; If not, the values 
> are different: 'foo' in one case, 'foo\n' in the other.

I still have not seen an answer to my question about what STAR specifies
for the construct.  Is this an open question on that front as well?  If
so, then can we assume that the same choice made for CIF will be made
for STAR?  As I wrote earlier, the 1994 STAR paper seems pretty clearly
(to me) to indicate that the trailing newline is part of the quoted
material.  (It emphasizes that the construct quotes one or more _lines_
of text.)  Are we contemplating a departure from STAR compatibility?
Also, although apparently both interpretations have been used in various
people's parsers, vcif includes the newline in the quoted material, and
vcif is the closest to a reference 1.0 parser implementation that we
have.

I continue to view general-purpose multiline quoting to be the role of
the -- now deferred -- square-bracket quoting mechanism.
Semicolon-delimited text blocks do not need to serve that role.  It
might have been nice if they had originally been defined that way, but
by my reading they were not, and it's too late to change that now.


Thanks for the various clarifications and implementation notes.  They
help.

[...]

> Also the discrete reserved words loop_, stop_ and global_ are 
> itemised in a separate table from that describing forbidden 
> unquoted substrings at the start of a data value.

So now these are only reserved as complete tokens rather than as initial
substrings?  Good.
 
[...]

> Para 42. Discussion of ways to handle machine-dependent <eol> 
> across common platforms is prefaced wit the header 
> "Implementation note:".

Very well.  I still read the original specifications to require that all
three of the common line termination conventions be supported, but
evidently that is not the prevailing opinion.  That being the case, I do
appreciate the specification of the implementation note.  I assume that
the same interpretation prevails for STAR?

[...]
 
> Para. 59. Copied productions for <Exponent>, UnsignedInteger> 
> and <Digit> as given in Appendix A summary table.

In both the appendix A summary and paragraph 59, the productions for
<Number> and <Float> are ambiguous (any text that matches <Number> also
matches <Float>).  In addition, text of the form 1e5 does not match
<Number>, although it is valid in all programming languages I know that
support scientific notation.  Both issues would be resolved by changing
the first alternative of the <Float> production from just <Integer> to
<Integer><Exponent>.


Regards,

John Bollinger

--

John C. Bollinger, Ph.D.
Indiana University
Molecular Structure Center

jobollin@indiana.edu
 

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.