Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for approval of CIF version 1.1 specification

  • To: Multiple recipients of list <comcifs-l@iucr.org>
  • Subject: Re: Request for approval of CIF version 1.1 specification
  • From: John Faber <faber@icdd.com>
  • Date: Wed, 16 Oct 2002 17:04:32 +0100 (BST)
I agree with this suggestion
John

At 02:27 AM 10/16/2002 +0100, you wrote:
>Attached are some comments regarding the 1.1 CIF specification
>as posted by Brian.
>
>I am wondering if it wouldn't make sense to officially announce these
>documents as "Working Specification" with the expressed intend to adopt
>them as "final" after being publicly available and actively used by
>implementors for a while (e.g. a year). Hopefully the community will
>see this as an invitation to participate in ironing out any kinks.
>
>Ralf
>
>
>----------------------------------------------------------------------------
>
>Version 1.1 Specification
>
>I have to admit that am still not entirely clear what the authoritative
>Version 1.0 Specification is (Hall, Allen & Brown, 1991?). It would be
>useful to clearly explain this in the introduction ("Revision
>history"). It would also be useful to outline the boundaries between
>this specification and the DDL specifications ("Scope").
>
>----------------------------------------------------------------------------
>
>Syntax:
>
>Definition of terms: consolidate in one place (link).
>
>Regarding quoting rules:
>
>I am asking myself how to deal with a string like
>
>;
>contains both isolated ' and " and ends with a '
>;
>
>If I understand correctly, anything can be handled in a multi-line text
>field. However, take the viewpoint of someone implementing a CIF
>writer. If the goal is to make the output human-readable, one would
>probably prefer quoted strings over multi-line text, in particular
>inside a loop_ construct. But then it seems necessary to pre-scan the
>text fields to be output to determine what kind of quoting is
>applicable. I am under the impression that it could be quite hard to
>devise an algorithm that generates both correct and "nice" output. A
>human working on a CIF will face similar difficulties. Isn't this an
>issue in practice? Could it be useful to include some practical
>quoting guidelines?
>
>I believe many people will expect constructs like
>   'an embedded \' quote'
>or
>   "an embedded \" double quote"
>to work as they do in many programming languages. To avoid this common
>misunderstanding it will be useful to provide a link to the "Accented
>letters" table in the semantics document.
>
>17. ... trailing white space on a line may however be elided.
>                                        ^^^
>
>In my opinion the specification should be unambiguous:
>White space should not be elided by the parser. The data value should
>be left untouched. Eliding is in the regime of semantics, not syntax.
>
>20. ...
>By contrast the value of the text field
>
>; foo
>   bar
>;
>
>is `foo\n bar' ...
>
>Should this be `foo\n  bar' (two spaces before the bar)?
>
>Also, in the semantics document the notation <eol> is used instead
>of \n. I suggest using <eol> everywhere.
>
>22.:
>
>The ASCII characters at decimal positions 11 (VT or vertical tab) and
>12 (FF or form feed), often included in library implementations as
>white space characters, are explicitly excluded from the CIF character
>set at this revision.
>
>Points:
>
>   1. I don't see the benefit of explicitly excluding these
>      characters. In practice it means that parsing of old
>      files might fail only because these characters are
>      embedded. I know there was some discussion already,
>      but I cannot remember the details. Is there something
>      wrong with the following, more forgiving approach:
>      Unquoted VT and FF are treated as white space,
>      quoted VT and FF are "passed through" like any other
>      character:
>
>        WhiteSpace> := { <SP> | <HT> | <VT> | <FF> | <eol>
>                         | <TokenizedComments>}+
>        <AnyPrintChar> := <OrdinaryChar> | <double_quote> | '#' | '$'
>                          | <single_quote> | '_' | <SP> | <HT>  | <VT> | <FF>
>                          | ';' | '[' | ']'
>
>   2. If it is decided to explicitly exclude VT and FF this deviation
>      from STAR should (also) be listed under "Implementation restrictions."
>
>
>27.
>
>How does the "Maximum line length" apply to <eol>\; quoted strings
>as explained in the semantics document? For example, is the following
>legal?
>
>;\
>2000 characters ...\
>2000 characters ...
>;
>
>Finally, in the post-Fortran and post-C era line length restrictions
>seem very arbitrary and are ultimately a nuisance. I'd rather see this
>restriction removed from the specification. Programs written in
>languages without automatic dynamic memory management could simply
>allocate a large buffer (e.g. 128k are perfectly reasonable these days)
>and report an "Technical limitation" in the highly unlikely event
>that the buffer is insufficient.
>
>----------------------------------------------------------------------------
>
>Semantic:
>
>This sentence in the introduction leaves me puzzled:
>
>   As computer techniques evolve, it becomes more appropriate to discuss
>   the machine-accessible semantic content, or "meaning", of the data in
>   such a file.
>
>Again: Definition of terms: consolidate in one place (link).
>
>10. The character string [local] is reserved for local use.
>                          ^     ^
>Is this [notation] used somewhere else? Are there alternatives?
>
>Handling of long lines
>
>   - I am a bit surprised that this is presented in the semantic
>     features document rather than the syntax document.
>
>   - Why do we need this for # comments?
>
>Typographic style codes
>
>   I don't see how these comments could make a significant difference in
>   practice, but they significantly contribute to conveying the
>   impression that the semantics features are a bit of a hodgepodge.
>   I suggest deleting the entire "Typographic style codes" section.
>
>
>__________________________________________________
>Do you Yahoo!?
>Faith Hill - Exclusive Performances, Videos & More
>http://faith.yahoo.com

John Faber, Ph.D.
Principal Scientist
International Centre for Diffraction Data
12 Campus Boulevard
Newtown Square, PA 19073-3273, USA
+1-610-325-9814 (phone)
+1-610-325-9823 (fax)
faber@icdd.com (e-mail)


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.