Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .. .

The present change document is unclear about the non-inclusion of the
terminal linefeed in all text fields.  This is important.  I will be
happy of that is indeed the case, but either way the document needs
to be clear on the point, because as noted in paragraph 26 it
is use in CIF1.1 to allow the presentation of
    "abcd"
    'abcd'
and
;\
abcd\
;

as the same thing.

If the comment side of the original line-folding protocol is
acceptable, the change document should say so.  Otherwise, by
explicitly including the text field part of paragraph 26, but
not the comment part, the impression might be created that
the comment line folding is excluded from CIF2.

The question on a terminal ;\ was not whether is it syntactically
correct under the current CIF2 document, but what the document
expects us to do about it.  It comes up because in existing
validation suites for the line-folding protocol under CIF1, rather
than treating as an error, it uses it as a way to allow an
embedded \n; in a line-folded text field.  Inasmuch as we are
in agreement that \n;\ is not a syntactically valid termination
of a text field in CIF2 as defined in the change document, there
is no harm in those of us who use the construct under as a
non-conflicting extension to CIF1 to continue to do so under CIF2.






At 5:03 PM -0500 8/1/11, Bollinger, John C wrote:
>Dear Herbert,
>
>Please find my comments in-line below.
>
>On Monday, August 01, 2011 9:05 AM, Herbert J. Bernstein wrote:
>>I do not completely understand change 11.  I do not understand
>>the precise interactions of changes 11 and 12 either, but once
>>change 11 is clarified, some examples should take care of that.
>>
>>
>>CHANGE 11 REFINEMENT to CIF1 LineFolding Protocol.
>>
>>The CIF line-folding protocol is a mechanism for splitting logical
>>lines of text across two or more physical lines of a CIF
>>semicolon-delimited data value ("text field"). A version of this
>>protocol appears among the "Common Semantic Features" of CIF 1.1 and
>>is in wide use in that context; in CIF 2.0, line-folding is part of
>>the CIF syntax, as described below.
>>
>>The protocol applies to text fields whose contents (after
>>interpretation of the text prefix protocol, if applicable) begin with
>>a backslash, followed by any
>>number of whitespace characters other than newline, followed by
>>newline or the end of the  text field; that sequence is designated
>>\<ws>*\n below. There must not be any whitespace preceding the
>>initial backslash.
>>
>>Given un-prefixed (Change 12) text field contents to which the
>>line-folding protocol applies, the logical text it represents is derived
>>from it by removing each occurrence of \<ws>*\n, including the
>>initial one. Different lines may have different amounts of whitespace
>>between
>>the trailing backslash and newline.
>>
>>Note that the line-folding protocol cannot elide the terminating \n;
>>of a text field because the \n of that delimiter is not accounted part
>>of the field contents. It follows from the definition of \<ws>*\n,
>>however, that if the last line ends with \<ws>* then that will not
>>appear in the unfolded value.
>>
>>=========================================================================
>>
>>This appears to differ significantly from the CIF1 line foldling
>>protocol in section 26 of the common semantic specification, which
>>did line folding for both comments and for text fields, and which
>>explicitly removed the terminal \n for a last line that ended with
>>\<ws>*\n:
>>
>>"The final line-termination-semicolon sequence of a text field takes
>>priority over the reassembly process and ends it, but a trailing
>>backslash on the last line of a text field very nicely conveys the
>  >information that no trailing line termination is intended to be
>  >included within the character string."
>
>
>Taking the second criticism first, I think you are talking about a 
>line-folded text field in which a literal backslash character and 
>possibly some whitespace immediately precede the closing delimiter. 
>It is my understanding that under the CIF 1.1 line-folding protocol, 
>the backslash and any whitespace between it and the newline are 
>removed during the unfolding process.  Although it is written in 
>different terms, the same is true of the protocol described in 
>change 11.  Indeed, that particular point is explicitly addressed in 
>the last paragraph of that section.
>
>As for the final newline itself, it is not part of the text field 
>contents anyway, just as the present Change 11 text says.  You 
>yourself argued strongly for that interpretation of CIF 1.1 -- and 
>prevailed -- and CIF2's specifications for text fields are clear in 
>that regard.  The line-folding protocol does not need to, and indeed 
>cannot, remove something that isn't there.  If you think it would be 
>better, however, then we could insert a clarifying remark that the 
>"contents" with which Change 11 is concerned exclude the field 
>delimiters.
>
>As for folding comment lines, that is a knowing omission on my part. 
>Nothing about comments is germane to the technical issue driving 
>possible adoption of text-field line folding into CIF2 syntax, so I 
>left comment-folding out in the interests of clarity, brevity, and 
>simplicity.  That in no way prevents CIF2 processors from folding 
>and unfolding comments according to the line-folding protocol for 
>comments, whether Change 11 is adopted or not.
>
>
>>If the new change 11 were aligned with section 26, I would vote for it.
>>Once that is done, given some clarifying examples of the interaction
>>of changes 11 and 12, I would probably vote for change 12.
>
>
>I will plan to work up the examples.
>
>
>>I think we also need a clarification of the interaction with Change 10.
>>We seem to be saying that whitespace between data values is required,
>>but what are we supposed to do if it is not present?
>>
>>For example, what is the meaning if we are already in a text field
>  >and encounter "\n;\".  Is this a termination for the text field and/or
>>is this an error and/or are we free to handle this as something outside
>  >of the CIF specification.
>
>Change 6, part (3) starts, "The string is initiated by an ASCII 
>newline semi-colon sequence, consists of any of the allowed 
>characters, and is terminated by the first subsequent ASCII newline 
>semi-colon sequence. Clearly, the strings within cannot contain an 
>ASCII newline semi-colon sequence."  For its part, change 11 says 
>nothing to qualify or modify the definition of text field 
>delimiters.  On the contrary, it defines the line-folding protocol 
>in terms of its affect on the *contents* of a text field, leaving it 
>to other parts of the specification (i.e. Change 6) to define the 
>extent of those contents.  These are consistent in specifying that
>
>_oops
>;\
>some text
>;\
>
>is as syntactically incorrect with the addition of Change 11 as it 
>is without.  Per Change 6, the second \n; is the closing delimiter 
>of the text field, notwithstanding the trailing backslash.  That 
>backslash is a syntax error because the preceding value must be 
>separated from what follows by whitespace, per change 10.  It seems 
>clear to me, but a clarification of the meaning of "contents" may be 
>in order, as I noted above.  I'm sure the group would consider 
>suggestions for clarifying it further if you have something in mind.
>
>If there were other classes of unclear cases then I suspect my 
>answer would boil down to about the same thing: change 11 and change 
>10 affect different scopes and do not interact, nor does change 11 
>affect the interaction of change 6 with change 10.  Suggestions for 
>text improvements are welcome.
>
>
>Best,
>
>John
>
>--
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>
>
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.