Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Result of concatenation operator vote. .

On Thursday, October 28, 2010 5:09 AM, Herbert J. Bernstein wrote:
>   Nothing is impossible, but the fact remains that for CIF2, all the
>uses of the backslash that had been agreed to for CIF1 were explicitly
>rejected, largely at Nick's insistence that somehow we were diverging
>from STAR.

I think there is an essential disconnect here.  Incorporating the line-folding protocol into the language *syntax* would diverge from STAR (and from CIF 1.1).  STAR says nothing about the interpretation of data values, however, so as long as line-folding does not require a syntactic deviation (which the version documented among the "Common Semantic Features" does not) nothing prevents its continued use with CIF2 as a semantic convention, exactly as it is used in CIF 1.1.

>The oddity you have pointed out in the "official" CIF1 syntax document
>is another case of Nick's insistence which in fact diverges from CIF1
>practice, existing code and existing round-trip cases.  We have somehow
>over the years entered an absurd never-never land in which the official
>CIF documents say one thing, and the established practice on which people
>do real work is something very different.
>   If you don't believe me, see the trip test at
>which explicitly tests that the ;\ construct works for line folding.

That is most unfortunate, because all semantic considerations aside, that construct violates CIF 1.1 syntax.  The formal grammar clearly specifies that the body of a text field cannot contain a semicolon at the beginning of a line, regardless of what follows.  Such a sequence can only be the field terminator or an error (see the syntax specification, paragraphs 49 and 56 and grammar production <eol><SemiColonTextField>).  No semantic convention is empowered to override that, ciftest2 notwithstanding.

To the extent people actually use that supposed feature of the protocol in the real world, it is indeed an issue, but there is no getting around the fact that such files are not well-formed CIF 1.1.  That's a practical problem for those files because they cannot be processed as intended by semantically-obtuse software, nor even by semantically aware software that implements the convention in a manner consistent with CIF 1.1 syntax.  Nevertheless, as long as people confine such files within workflows that understand them, there is no practical problem.

I have trouble believing that there are many such files loose in the wild, however.  For one thing, they do violate the syntax spec, thus some parsers will justifiably reject them.  More importantly, though, about the only likely real-world use I have imagined for them is embedding CIF in CIF, which hardly seems like it would be a widespread need.

Any way around, people who implement line-folding to produce and accept "<eol>;\" as an embedded "<eol>;" will be in exactly the same non-conforming position under CIF 2.0 (as it now stands) as they already are under CIF 1.1.  If that does not currently cause them difficulty, then there is no reason to expect that it will cause them difficulty under CIF 2.0.

>   The line folding protocol is an essential reality, especially to
>allow CIF to be used with Fortran.

As far as I know, no one has objected to continued use of the line-folding protocol as a *semantic* convention.  I construe Nick's objections as applying to incorporating the protocol into the language syntax, or altering the syntax to accommodate it, and I share that objection.  However, I am unaware of anything about the protocol (as documented) that would require syntax-level support.

I don't see anyone trying to take away line-folding.

>   The use of the required whitespace after everything except the last
>token in a CIF document is an essential reality in lexical scans of existing CIF documents.

CIF lexical scans must indeed be sensitive to whitespace to achieve correct tokenization.  Trailing whitespace is not necessary for a lexical scanner to recognize the closing delimiter of a text field, however, nor to diagnose a closing delimiter not followed immediately by whitespace (or EOF) as the syntax error that the spec explicitly defines it to be.

>   In the name of what is to me is an incomprensible adherence to a
>constantly changing and undocumented STAR standard has resulted in loss
>of functionality that is needed to keep current applications and current
>CIF datasets in use.

I see no loss of functionality here.  Declining to incorporate the line-folding protocol (or any other semantic convention) into the language syntax does not render it obsolete.  I support continued use of the documented line-folding protocol, and I would encourage COMCIFS consider further solidifying that by defining and using one or more data types in the DDLm dictionaries that explicitly recognize it.  Conceivably, _all_ the character types could be defined that way.

>   Of course these issues can be resolved.  I keep accumulating fudges
>for CIFtbx and CBFlib to deal with them.  The problem is that, without
>any COMCIFS level agreement on what the preferred fudges are, there is
>no reason to expect that the files my code reads and writes will be
>compatible with the files that, say, your code reads and writes, or
>compatible with the files that, say, John Westbrook's code reads and
>writes, almost guaranteeing that CIF is going to degenerate even more than
>it has into multiple idiosyncratic dialects.  To me this seems to be the
>antithesis of the goal of the creation of COMCIFS -- which was, as its
>name says, to maintain the CIF _standard_.

I quite agree.

I'm curious, though, about these fudges.  I understand being tolerant of common errors and nonstandard practices, but that's not the same as accepting them as standard.  Is CIF 1.1 or the CIF 2.0 draft inherently inconsistent or under-specified?  Otherwise, why might someone suppose that COMCIFS's hypothetical list of preferred fudges would be non-empty?

>   I apologize for sounding so preachy and stuffy, but I really think it
>would be a good idea to resolve these issues in some commonly agreed
>manner and try to keep CIF as a common language, rather than heading
>further into multiple dialects.

Would you be interested in explaining the multiple dialect problem a little more fully?  I'm not certain whether you're talking about the syntactic level or about one or more of the several semantic levels associated with CIF.  Is imgCIF part of this, or a separate question?  I guess what I want to understand is which dimensions of the CIF world you would like to standardize.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.