[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Problems with CIF BNF
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
- Subject: Re: Problems with CIF BNF
- From: Joe Krahn <[email protected]>
- Date: Mon, 12 Mar 2007 16:30:37 -0400
- In-Reply-To: <a06230902c21b5216f07d@[192.168.10.211]>
- References: <[email protected]><a06230902c21b5216f07d@[192.168.10.211]>
I realize that there are a few hacks in the BNF to deal with
context-dependence, like productions defined as multiple symbols, which
make it impossible to use as a working BNF. But, there are other
problems with grammar. With the end-of-line example, the lexer can do
something 'sensible', but it is still important to have a specific
definition of whether missing a terminal <eol> makes the CIF invalid.
I can look at CBFlib to see an interpretation of the CIF grammar, but
someone else's parser may have a different interpretation. In fact, it
would be good to have a collection of unusual CIF files for parser
testing, with a consensus as to which ones are valid and which are invalid.
Joe
Herbert J. Bernstein wrote:
> Without a defined lexer, you cannot do CIF as a BNF; it is context
> sensitive in its use of whitespace. The question you are raising
> about EOF should be handled by the lexer, which should deal sensibly
> with the usual unix problem of disambiguating the case of a final
> line that ends with eof rather than eol-eof. There is a rather
> complete bison grammar in CBFlib working on the level of tokens
> after lexing the input. -- HJB
>
>
> At 1:44 PM -0400 3/12/07, Joe Krahn wrote:
>> Some parts of CIF are vague. I hoped that the BNF syntax would be a
>> precise syntax specification, but it has problems. It is central to
>> properly defining the CIF format, and should therefore be very accurate.
>>
>> First, there are some plain syntax errors, like unbalanced braces in the
>> production of <Float>, and an empty token in the TokenizedComments
>> production.
>>
>> There are also a few hacks like <noteol>, and the lack of rules for the
>> content of quoted strings. I think it is also a hack for a production
>> unit to be defined for two elements, like "<eol><UnquotedString>".
>>
>> Does EOF count as whitespace? Normally, a text file ends with an <eol>
>> on the last line, so it is not a problem. With Fortran, you may not be
>> able to distinguish between them, so it seems that EOF probably should
>> count as a whitespace token.
>>
>> There are also places where the grammar could be simplified, such as:
>>
>> { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger>
>>
>> written as:
>> {'e' | 'E' } { '+' | '-' }? <UnsignedInteger>
>>
>> Also note the error in the first form copied from the web page: the
>> minus sign has a space included.
>>
>> Should the logical-OR symbol always be contained within braces? This
>> appears to be inconsistent, but maybe the rule is to require braces when
>> the members include a quoted character element.
>>
>> I will try to edit my own version of the BNF to produce what I think it
>> is supposed to mean. Answers to some of the above questions will be
>> helpful in getting it right.
>>
>> Thanks,
>> Joe Krahn
>> _______________________________________________
>> comcifs mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/comcifs
>
> _______________________________________________
> comcifs mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/comcifs
Reply to: [list | sender only]
- Follow-Ups:
- Re: Problems with CIF BNF (Herbert J. Bernstein)
- References:
- Problems with CIF BNF (Joe Krahn)
- Re: Problems with CIF BNF (Herbert J. Bernstein)
- Prev by Date: COMCIFS Annual Report for 2006 (draft)
- Next by Date: Re: COMCIFS Annual Report for 2006 (draft)
- Prev by thread: Re: Problems with CIF BNF
- Next by thread: Re: Problems with CIF BNF
- Index(es):

