Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .. .

Dear Herbert,

Please find my comments in-line below.

On Monday, August 01, 2011 9:05 AM, Herbert J. Bernstein wrote:
>I do not completely understand change 11.  I do not understand
>the precise interactions of changes 11 and 12 either, but once
>change 11 is clarified, some examples should take care of that.
>
>
>CHANGE 11 REFINEMENT to CIF1 LineFolding Protocol.
>
>The CIF line-folding protocol is a mechanism for splitting logical
>lines of text across two or more physical lines of a CIF
>semicolon-delimited data value ("text field"). A version of this
>protocol appears among the "Common Semantic Features" of CIF 1.1 and
>is in wide use in that context; in CIF 2.0, line-folding is part of
>the CIF syntax, as described below.
>
>The protocol applies to text fields whose contents (after
>interpretation of the text prefix protocol, if applicable) begin with
>a backslash, followed by any
>number of whitespace characters other than newline, followed by
>newline or the end of the  text field; that sequence is designated
>\<ws>*\n below. There must not be any whitespace preceding the
>initial backslash.
>
>Given un-prefixed (Change 12) text field contents to which the
>line-folding protocol applies, the logical text it represents is derived
>from it by removing each occurrence of \<ws>*\n, including the
>initial one. Different lines may have different amounts of whitespace
>between
>the trailing backslash and newline.
>
>Note that the line-folding protocol cannot elide the terminating \n;
>of a text field because the \n of that delimiter is not accounted part
>of the field contents. It follows from the definition of \<ws>*\n,
>however, that if the last line ends with \<ws>* then that will not
>appear in the unfolded value.
>
>=========================================================================
>
>This appears to differ significantly from the CIF1 line foldling
>protocol in section 26 of the common semantic specification, which
>did line folding for both comments and for text fields, and which
>explicitly removed the terminal \n for a last line that ended with
>\<ws>*\n:
>
>"The final line-termination-semicolon sequence of a text field takes
>priority over the reassembly process and ends it, but a trailing
>backslash on the last line of a text field very nicely conveys the
>information that no trailing line termination is intended to be
>included within the character string."


Taking the second criticism first, I think you are talking about a line-folded text field in which a literal backslash character and possibly some whitespace immediately precede the closing delimiter.  It is my understanding that under the CIF 1.1 line-folding protocol, the backslash and any whitespace between it and the newline are removed during the unfolding process.  Although it is written in different terms, the same is true of the protocol described in change 11.  Indeed, that particular point is explicitly addressed in the last paragraph of that section.

As for the final newline itself, it is not part of the text field contents anyway, just as the present Change 11 text says.  You yourself argued strongly for that interpretation of CIF 1.1 -- and prevailed -- and CIF2's specifications for text fields are clear in that regard.  The line-folding protocol does not need to, and indeed cannot, remove something that isn't there.  If you think it would be better, however, then we could insert a clarifying remark that the "contents" with which Change 11 is concerned exclude the field delimiters.

As for folding comment lines, that is a knowing omission on my part.  Nothing about comments is germane to the technical issue driving possible adoption of text-field line folding into CIF2 syntax, so I left comment-folding out in the interests of clarity, brevity, and simplicity.  That in no way prevents CIF2 processors from folding and unfolding comments according to the line-folding protocol for comments, whether Change 11 is adopted or not.


>If the new change 11 were aligned with section 26, I would vote for it.
>Once that is done, given some clarifying examples of the interaction
>of changes 11 and 12, I would probably vote for change 12.


I will plan to work up the examples.


>I think we also need a clarification of the interaction with Change 10.
>We seem to be saying that whitespace between data values is required,
>but what are we supposed to do if it is not present?
>
>For example, what is the meaning if we are already in a text field
>and encounter "\n;\".  Is this a termination for the text field and/or
>is this an error and/or are we free to handle this as something outside
>of the CIF specification.

Change 6, part (3) starts, "The string is initiated by an ASCII newline semi-colon sequence, consists of any of the allowed characters, and is terminated by the first subsequent ASCII newline semi-colon sequence. Clearly, the strings within cannot contain an ASCII newline semi-colon sequence."  For its part, change 11 says nothing to qualify or modify the definition of text field delimiters.  On the contrary, it defines the line-folding protocol in terms of its affect on the *contents* of a text field, leaving it to other parts of the specification (i.e. Change 6) to define the extent of those contents.  These are consistent in specifying that

_oops
;\
some text
;\

is as syntactically incorrect with the addition of Change 11 as it is without.  Per Change 6, the second \n; is the closing delimiter of the text field, notwithstanding the trailing backslash.  That backslash is a syntax error because the preceding value must be separated from what follows by whitespace, per change 10.  It seems clear to me, but a clarification of the meaning of "contents" may be in order, as I noted above.  I'm sure the group would consider suggestions for clarifying it further if you have something in mind.

If there were other classes of unclear cases then I suspect my answer would boil down to about the same thing: change 11 and change 10 affect different scopes and do not interact, nor does change 11 affect the interaction of change 6 with change 10.  Suggestions for text improvements are welcome.


Best,

John

--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.