Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with CIF BNF

I realize that there are a few hacks in the BNF to deal with
context-dependence, like productions defined as multiple symbols, which
make it impossible to use as a working BNF. But, there are other
problems with grammar. With the end-of-line example, the lexer can do
something 'sensible', but it is still important to have a specific
definition of whether missing a terminal <eol> makes the CIF invalid.

I can look at CBFlib to see an interpretation of the CIF grammar, but
someone else's parser may have a different interpretation. In fact, it
would be good to have a collection of unusual CIF files for parser
testing, with a consensus as to which ones are valid and which are invalid.

Joe

Herbert J. Bernstein wrote:
> Without a defined lexer, you cannot do CIF as a BNF; it is context
> sensitive in its use of whitespace.  The question you are raising
> about EOF should be handled by the lexer, which should deal sensibly
> with the usual unix problem of disambiguating the case of a final
> line that ends with eof rather than eol-eof.  There is a rather
> complete bison grammar in CBFlib working on the level of tokens
> after lexing the input.  -- HJB
> 
> 
> At 1:44 PM -0400 3/12/07, Joe Krahn wrote:
>> Some parts of CIF are vague. I hoped that the BNF syntax would be a
>> precise syntax specification, but it has problems. It is central to
>> properly defining the CIF format, and should therefore be very accurate.
>>
>> First, there are some plain syntax errors, like unbalanced braces in the
>> production of <Float>, and an empty token in the TokenizedComments
>> production.
>>
>> There are also a few hacks like <noteol>, and the lack of rules for the
>> content of quoted strings. I think it is also a hack for a production
>> unit to be defined for two elements, like "<eol><UnquotedString>".
>>
>> Does EOF count as whitespace? Normally, a text file ends with an <eol>
>> on the last line, so it is not a problem. With Fortran, you may not be
>> able to distinguish between them, so it seems that EOF probably should
>> count as a whitespace token.
>>
>> There are also places where the grammar could be simplified, such as:
>>
>>   { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger>
>>
>> written as:
>>   {'e' | 'E' } { '+' | '-' }?  <UnsignedInteger>
>>
>> Also note the error in the first form copied from the web page: the
>> minus sign has a space included.
>>
>> Should the logical-OR symbol always be contained within braces? This
>> appears to be inconsistent, but maybe the rule is to require braces when
>> the members include a quoted character element.
>>
>> I will try to edit my own version of the BNF to produce what I think it
>> is supposed to mean. Answers to some of the above questions will be
>> helpful in getting it right.
>>
>> Thanks,
>> Joe Krahn
>> _______________________________________________
>> comcifs mailing list
>> comcifs@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/comcifs
> 
> _______________________________________________
> comcifs mailing list
> comcifs@iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
_______________________________________________
comcifs mailing list
comcifs@iucr.org
http://scripts.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.