Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Backus-Naur Form for CIF

  • Subject: RE: Backus-Naur Form for CIF
  • From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
  • Date: Wed, 4 Oct 2000 15:37:08 +0100 (BST)

Brian McMahon wrote:
>  On Wed, Oct 04, 2000 at 03:08:18AM +0100, Nick Spadaccini (>) wrote:
>  > On Tue, 3 Oct 2000, Herbert J. Bernstein (>>) wrote:
>  > 
>  >> 1.  In the paper it says under "The CIF restrictions to 
> the STAR File
>  >> syntax are...":
>  >>  
>  >>   " ... All data names and block codes are case 
> insensitive, i.e. _ABS and
>  >> _abs are treated identically."
>  >> 
>  >> The usual approach used in Fortran of redefining a-z with 
> productions of
>  >> the form a ::= "A"|"a" won't work here, since we need to 
> preserve case
>  >> sensitivity for text.  In practice this would be fudged 
> in the lexical
>  >> scanner, but, for clarity, I would suggest adding an 
> explicit comment
>  >> explaining the case-insensitivity of data names and some 
> productions of
>  >> the form:
>  >> 
>  >>   <DATA_>  ::=  {"D"|"d"} {"A"|"a"} {"T"|"t"} {"A"|"a"} "_"
>  >>   <LOOP_>  ::=  {"L"|"l"} {"O"|"o"} {"O"|"o"} {"P"|"p"} "_"
>  >> 
>  >> to use in place of the "data_" and "loop_" strings
>  > 
>  > Absolutely. This is very much what is done in the yacc 
> implementation for
>  > starbase. Namely we redefine the characters a,b,d etc to 
> be of either case
>  > and then define the tokens using these, as in ....

[...]
 
> This is a nice example of "literalism", and good reason to 
> have these details
> thrashed out formally. I had always taken the view that the 
> "reserved words"
> data_ and loop_ were case-SENSITIVE; so data_foo and data_FOO were
> identical, DATA_foo was invalid. I'm happy to accept the new 
> convention,
> although it does mean some code rewriting.

I had taken the paper's omission of STAR's/CIF's reserved words from its
statement about case-insensitivity to be intentional -- i.e. the reserved
words were to be case-sensitive.  I don't have a big investment in that
view, although it does simplify things a bit.  I don't see any logical
inconsistency with that specification, either -- those parts which are
case-insensitive would be exactly the "user-specified" structural pieces.
 
>  >> 2.  The production for <data_block> does not require any 
> leading or
>  >> trailing whitespace, so that a <CIF_file> could consist of a
>  >> <data_heading> and a <data> item immediately followed by another
>  >> <data_heading>, etc.  I cannot seem to find where the productions
>  >> explicitly require whitespace between the data item and the second
>  >> data heading.  A similar problem seems to exist in the 
> production for
>  >> loop values.  This would certainly be solved by implicit 
> precedence
>  >> among the productions or by operation of the lexical 
> scanner, but it would
>  >> best to have the BNF be unambiguous in the handling of whitespace.
[...]
>  > etc etc" without any explicit rules. I can see a fix, but 
> it would need an
>  > exception. Namely change 
>  > 
>  >    <data_block>   ::= <wspace>* <data_heading> <data>+ <wspace>*
>  > to 
>  >    <data_block>   ::= <wspace>+ <data_heading> <data>+ <wspace>*
>  > 
>  > The exception being the leading <wspace> need not be there 
> IF IT IS THE
>  > BEGINNING OF THE FILE. You could equally have
>  > 
>  > <data_block>   ::= <wspace>* <data_heading> <data>+ <wspace>+
>  > 
>  > with the exception about the end of the file.
>  > 
>  > This exception would have to be "written as a comment" and 
> not formally
>  > part of the BNF syntax (unless someone can see how to do 
> it elegantly).
>  > 
>  > What's the consensus?
> 
> I prefer the exception at the end of the file (i.e. the 
> second alternative).
> Could it be formalised by including an end-of-file token?
>    <data_block>   ::= <wspace>* <data_heading> <data>+ 
> (<wspace>|<eof>)+
> Though I guess the problem is that you then need to insert <eof>'s
> everywhere that they might legitimately occur in a valid file 
> description.

Which actually is not very many places.  In fact, in CIF an <eof> can only
appear either after a complete data block (as above) or before any data
block.

I'll go on the record as agreeing with Nick about specifying whitespace in
BNF descriptions, but I also note that it is necessary for STAR and
its derivatives because STAR does not treat all whitespace equally.

>  >> 3.  The paper speaks of blanks, but not of tabs and 
> vertical tabs and
>  >> formfeeds.  Most systems will accept handle tabs 
> reasonably.  Not all
>  >> systems can handle vertical tab or form feed.  Are we 
> requiring all
>  >> CIF parsers to be able to handle more than blank and tab?

The original papers on STAR and CIF may not discuss whitespace at this level
of detail, but later papers have done.  I don't think we can reasonably put
the cat back into the bag.  As for "handling" vertical tabs and form feeds,
are we talking about display and printing -- which I agree will not be
uniform -- or parsing?  The parsing is a manageable programming issue, at
least in C/C++ and Fortran 77/90/95.  I have C and F77 parsers which manage
VT, FF, etc. just fine.  Java will not be a problem either, although I don't
have a working example.

As for what we are requiring of CIF parsers, in light of IUCr's new
statement
of policy on STAR and CIF I would say that yes, we appear to be requiring
CIF 
parsers to be able to handle more than blank and tab.
 
[...]

I will agree with Nick and Brian with regard to [not] putting the semantic
details of CIF's data-typing rules, special data values, and backslash
escape sequences into the BNF.  The only one that in my opinion might
reasonably belong there would be a production for numeric-type data values,
but I am entirely satisfied to leave that out.  I want the BNF to tell me
how a CIF is built, but I rely on other sources to tell me what it means.

Cheers,

John

--

John C. Bollinger, PhD
Indiana University
Molecular Structure Center

jobollin@indiana.edu

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.