Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .. .. .. .

Dear Herbert,

On Tuesday, August 02, 2011 8:52 PM, you wrote:

[I wrote:]
>> I take it, then, that you do not find the Sugar\nFlour\nButter example
>> in the current draft to be sufficient for this purpose.  Fair enough,
>> but that leaves me uncertain of what kind of clarification you are
>> looking for.  Perhaps you would be willing to suggest something
>> specific?
>>
>
>I would suggest a clear statement of the intended meaning for
>
>;abcd
>;
>
>;\
>ab\
>cd
>;
>
>;\
>ab\
>cd\
>;
>
>;CIF>\\
>CIF>ab\
>CIF>cd
>;


That would be entirely reasonable.  The document already contains an example for the case with neither line-folding nor text-prefixing.  Change 12 contains examples for text prefixing, but does not state what their meaning is; these examples can easily be updated to clarify.  Yesterday I wrote an example for combining the two protocols, and I can easily write one or two for the line-folding protocol on its own.


>The combination of the current draft and your email leads me to
>suspect that all of these may be intended to be equivalent
>to "abcd" and to 'abcd' and to a blank delimited abcd
>
>Is that correct?  If not, please disambiguate as appropriate.


That is correct.


>> If it would help, I would be happy to add a brief clarifying remark to
>> Change 11 that summarizes the status of comment line folding in CIF2.
>> For example: "Although CIF 1.1's common semantic features include an
>> analogous line-folding protocol for comments, that protocol is not
>> incorporated into CIF 2.0 _syntax_.  Although it remains outside the
>> scope of CIF syntax, it is anticipated that some CIF 2.0 processors will
>> continue to recognize that protocol."
>
>I understand neither your analysis nor your suggested wording.
>You seem to be arguing issues not in dispute.
>How about the following?
>
>"The analagous line-folding protocol for comments specified in
>paragraph 26 of the common semantic features of CIF 1.1
>remains a common semantic feature of CIF 2.  There is no
>change in comment syntax between CIF 1.1 and CIF2."


The problem with that text is that the common semantic features are outside the scope of the document we are discussing ("This document specifies changes to the *syntax and binary form* of CIF" (emphasis added)).  My text constitutes an attempt to address your concern within the document's scope, and it goes as far as I could see a way for it to go within its constraints.  If it is convoluted or confusing then that reflects the fact that it can't say in the present document what I think you actually want to hear.

Moreover, it is a separate question altogether whether comment line-folding indeed will formally be a Common Semantic Feature of CIF2.  I anticipate that it will, but even if not, I think it entirely safe to predict that some CIF2 applications will still honor it, exactly as my proposed text describes.

My preference would be to not address this question at all in the syntax specification.  I do not think that would have any implications for comment line folding, and it was intentional that my draft is written that way.  I am nevertheless open to including remarks that address the question to the extent the document's scope permits.  Perhaps you would prefer something like the following, which I previously rejected because I did not think you would find it strong enough: "The CIF syntax specifications do not address use in CIF 2.0 documents of CIF 1.1's analogous line-folding semantics for comments."


>> That is a question of application design, not CIF syntax.  It is
>> perilous to write files using that formalism, as some CIF processors
>> would certainly reject them, but that's outside the scope of the spec.
>> The spec merely defines that such files are not well-formed CIFs.  As
>> for reading files that use it, I adapt an old saw from the Fortran
>> community: if the file does not comply with the CIF specifications then
>> a processor may do anything it wants with it, including starting World
>> War III.  I do trust that most CIF readers will exercise greater
>> restraint, however.
>
>This is a technically defensible but impractical position.
>Some syntax errors make it impossible to guess the intent
>of the text.  Some syntax errors have a clear intent.
>Most syntax errors are in some fuzzy middle ground.
>The normal practice is extending languages is to try
>to add new constructs somewhere in the middle ground
>of syntax errors with clear intent or at the boundary.


I do not contest those assertions, but they are not relevant here because this narrow issue (a closing text field delimiter with an appended backslash) does not involve any change in allowed syntax.  In CIF 1.1, just as in CIF 2.0, the closing delimiter of a text field is always the first newline/semicolon sequence following the opening delimiter, line-folding protocol and trailing backslashes notwithstanding.  In CIF 1.1, just as in CIF 2.0, CIF syntax requires the closing delimiter of a text field to be separated by whitespace from any following keyword, data name, or data value.  Therefore in CIF 1.1, just as in CIF 2.0, it is a syntax error if a backslash is appended to the closing delimiter of a text field.

Furthermore, there is not even a consensus for how to handle this situation in CIF 1.1, as this group established during some of its discussions last year.  Some popular software attempts to match the behavior of a bugged program in the CIF validation suite, at least sometimes, whereas other popular software never does so.

In any event, it is entirely practical -- in fact it is *driven* by practicality -- to restrict the CIF syntax specifications to questions of syntax.  If we were to expand the scope of the specification to cover processor error-handling behavior then there are many other cases that we would be obliged to address, involving much more work than I think any member of this group will want to undertake at this point.  Moreover, all of these error-handling specifications would be new to CIF2, and they would necessarily conflict with the behaviors of some existing CIF 1.1 software, giving rise to exactly the kind of problems that you describe (below) for the Fortran standardization process.

Additionally, it is altogether *im*practical to attempt to specify particular error recovery behavior here.  We are not in a position to anticipate all CIFs that an application may encounter, nor all requirements that particular applications must satisfy.  Any intent the specification might express for how processors should handle this error will certainly be wrong for some situations.

The most I would agree to in the syntax specification is commentary describing the nature of this problem without directing any particular handling.  For example, something like the following could be appended to the last paragraph of Change 11:

"Furthermore, it follows from the fact that the line-folding applies to the *contents* of a text field that it does not affect the recognition of text field delimiters, which must in principle be performed before the text to which the protocol may apply is known.  In particular, the practice of appending a backslash to a literal newline/semicolon pair as a means of attempting to elide it into the body of a line-folded text field, recognized by some CIF 1.1 software, in fact neither elides the delimiter nor even yields syntactically correct CIF (2.0 or 1.1)."

That documents one reasonable guess about what the CIF author's intent might have been, while leaving the decision about how to handle it up to the CIF processor.


>It is in large part becuase of the espousal of a similar
>position to yours by X3J3 that I have a supply of
>bumper stickers that say "Save Fortran -- Ban X3J3"
>The community voted with its feet (and programs) and
>the current Fortran practice is for compilers to
>compile almost everything that looks like a reasonable
>variant of Fortran-77, Fortran-8x, Fortran-9x and Fortran-2003
>with a minimum of fuss.  I was just compiling a Fortran-77
>program with the latest gfortran and it accepted the program
>happily and without a single warning (even though I
>used -Wall).  I use the same compiler to handle rather
>recent Fprtran-2003 code including code with the new ISO
>C binding to allow mixed C and Fortran.  I think the
>current approach to be a much better way to design
>processing software than the old X3J3 approach of the
>1980 and early 1990s that kept breaking old programs.


Your comparison with Fortran is inapposite for at least three reasons:

1) The Fortran specification's scope extends far beyond language syntax to cover the behavior of processors (compilers) and programs, whereas the CIF syntax specification's scope is limited to CIF syntax.

2) The specific error we are focusing on has always been a syntax error in CIF, buggy validation suites notwithstanding.  Nothing proposed for CIF2 affects its status, so there is no special need to add new specifications for how to handle it, and there is no backwards compatibility impact of avoiding such additional specifications.

3) There is not even a consensus practice in this area among existing CIF 1.1 (or earlier) programs.  Specifying intended behavior here would therefore not be comparable to directing Fortran compilers to continue to support features that are deprecated in or removed from recent standards.  Instead, it would be more like Fortan-2020 changing a Fortan-77 feature such that its implementation by half the then-existing compilers becomes forbidden.  If I understand you correctly, that's exactly the sort of behavior for which you deride X3J3.


Regards,

John

--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.