Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for approval of CIF version 1.1 specification

  • To: Multiple recipients of list <comcifs-l@iucr.org>
  • Subject: Re: Request for approval of CIF version 1.1 specification
  • From: John Faber <faber@icdd.com>
  • Date: Wed, 16 Oct 2002 17:04:32 +0100 (BST)
I agree with this suggestion

At 02:27 AM 10/16/2002 +0100, you wrote:
>Attached are some comments regarding the 1.1 CIF specification
>as posted by Brian.
>I am wondering if it wouldn't make sense to officially announce these
>documents as "Working Specification" with the expressed intend to adopt
>them as "final" after being publicly available and actively used by
>implementors for a while (e.g. a year). Hopefully the community will
>see this as an invitation to participate in ironing out any kinks.
>Version 1.1 Specification
>I have to admit that am still not entirely clear what the authoritative
>Version 1.0 Specification is (Hall, Allen & Brown, 1991?). It would be
>useful to clearly explain this in the introduction ("Revision
>history"). It would also be useful to outline the boundaries between
>this specification and the DDL specifications ("Scope").
>Definition of terms: consolidate in one place (link).
>Regarding quoting rules:
>I am asking myself how to deal with a string like
>contains both isolated ' and " and ends with a '
>If I understand correctly, anything can be handled in a multi-line text
>field. However, take the viewpoint of someone implementing a CIF
>writer. If the goal is to make the output human-readable, one would
>probably prefer quoted strings over multi-line text, in particular
>inside a loop_ construct. But then it seems necessary to pre-scan the
>text fields to be output to determine what kind of quoting is
>applicable. I am under the impression that it could be quite hard to
>devise an algorithm that generates both correct and "nice" output. A
>human working on a CIF will face similar difficulties. Isn't this an
>issue in practice? Could it be useful to include some practical
>quoting guidelines?
>I believe many people will expect constructs like
>   'an embedded \' quote'
>   "an embedded \" double quote"
>to work as they do in many programming languages. To avoid this common
>misunderstanding it will be useful to provide a link to the "Accented
>letters" table in the semantics document.
>17. ... trailing white space on a line may however be elided.
>                                        ^^^
>In my opinion the specification should be unambiguous:
>White space should not be elided by the parser. The data value should
>be left untouched. Eliding is in the regime of semantics, not syntax.
>20. ...
>By contrast the value of the text field
>; foo
>   bar
>is `foo\n bar' ...
>Should this be `foo\n  bar' (two spaces before the bar)?
>Also, in the semantics document the notation <eol> is used instead
>of \n. I suggest using <eol> everywhere.
>The ASCII characters at decimal positions 11 (VT or vertical tab) and
>12 (FF or form feed), often included in library implementations as
>white space characters, are explicitly excluded from the CIF character
>set at this revision.
>   1. I don't see the benefit of explicitly excluding these
>      characters. In practice it means that parsing of old
>      files might fail only because these characters are
>      embedded. I know there was some discussion already,
>      but I cannot remember the details. Is there something
>      wrong with the following, more forgiving approach:
>      Unquoted VT and FF are treated as white space,
>      quoted VT and FF are "passed through" like any other
>      character:
>        WhiteSpace> := { <SP> | <HT> | <VT> | <FF> | <eol>
>                         | <TokenizedComments>}+
>        <AnyPrintChar> := <OrdinaryChar> | <double_quote> | '#' | '$'
>                          | <single_quote> | '_' | <SP> | <HT>  | <VT> | <FF>
>                          | ';' | '[' | ']'
>   2. If it is decided to explicitly exclude VT and FF this deviation
>      from STAR should (also) be listed under "Implementation restrictions."
>How does the "Maximum line length" apply to <eol>\; quoted strings
>as explained in the semantics document? For example, is the following
>2000 characters ...\
>2000 characters ...
>Finally, in the post-Fortran and post-C era line length restrictions
>seem very arbitrary and are ultimately a nuisance. I'd rather see this
>restriction removed from the specification. Programs written in
>languages without automatic dynamic memory management could simply
>allocate a large buffer (e.g. 128k are perfectly reasonable these days)
>and report an "Technical limitation" in the highly unlikely event
>that the buffer is insufficient.
>This sentence in the introduction leaves me puzzled:
>   As computer techniques evolve, it becomes more appropriate to discuss
>   the machine-accessible semantic content, or "meaning", of the data in
>   such a file.
>Again: Definition of terms: consolidate in one place (link).
>10. The character string [local] is reserved for local use.
>                          ^     ^
>Is this [notation] used somewhere else? Are there alternatives?
>Handling of long lines
>   - I am a bit surprised that this is presented in the semantic
>     features document rather than the syntax document.
>   - Why do we need this for # comments?
>Typographic style codes
>   I don't see how these comments could make a significant difference in
>   practice, but they significantly contribute to conveying the
>   impression that the semantics features are a bit of a hodgepodge.
>   I suggest deleting the entire "Typographic style codes" section.
>Do you Yahoo!?
>Faith Hill - Exclusive Performances, Videos & More

John Faber, Ph.D.
Principal Scientist
International Centre for Diffraction Data
12 Campus Boulevard
Newtown Square, PA 19073-3273, USA
+1-610-325-9814 (phone)
+1-610-325-9823 (fax)
faber@icdd.com (e-mail)