Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .

I do not completely understand change 11.  I do not understand
the precise interactions of changes 11 and 12 either, but once
change 11 is clarified, some examples should take care of that.


CHANGE 11 REFINEMENT to CIF1 LineFolding Protocol.

The CIF line-folding protocol is a mechanism for splitting logical
lines of text across two or more physical lines of a CIF 
semicolon-delimited data value ("text field"). A version of this 
protocol appears among the "Common Semantic Features" of CIF 1.1 and 
is in wide use in that context; in CIF 2.0, line-folding is part of 
the CIF syntax, as described below.

The protocol applies to text fields whose contents (after 
interpretation of the text prefix protocol, if applicable) begin with 
a backslash, followed by any
number of whitespace characters other than newline, followed by 
newline or the end of the  text field; that sequence is designated 
\<ws>*\n below. There must not be any whitespace preceding the 
initial backslash.

Given un-prefixed (Change 12) text field contents to which the
line-folding protocol applies, the logical text it represents is derived
from it by removing each occurrence of \<ws>*\n, including the 
initial one. Different lines may have different amounts of whitespace 
between
the trailing backslash and newline.

Note that the line-folding protocol cannot elide the terminating \n;
of a text field because the \n of that delimiter is not accounted part
of the field contents. It follows from the definition of \<ws>*\n,
however, that if the last line ends with \<ws>* then that will not
appear in the unfolded value.

=========================================================================

This appears to differ significantly from the CIF1 line foldling
protocol in section 26 of the common semantic specification, which
did line folding for both comments and for text fields, and which
explicitly removed the terminal \n for a last line that ended with
\<ws>*\n:

"The final line-termination-semicolon sequence of a text field takes 
priority over the reassembly process and ends it, but a trailing 
backslash on the last line of a text field very nicely conveys the 
information that no trailing line termination is intended to be 
included within the character string."

If the new change 11 were aligned with section 26, I would vote for it.
Once that is done, given some clarifying examples of the interaction
of changes 11 and 12, I would probably vote for change 12.

I think we also need a clarification of the interaction with Change 10.
We seem to be saying that whitespace between data values is required,
but what are we supposed to do if it is not present?

For example, what is the meaning if we are already in a text field
and encounter "\n;\".  Is this a termination for the text field and/or
is this an error and/or are we free to handle this as something outside
of the CIF specification.









At 11:22 PM +1000 8/1/11, James Hester wrote:
>Dear DDLm-ers,
>
>John B. has kindly incorporated these changes into the latest CIF2 
>draft which Brian M. has now posted on the IUCr website.  The draft 
>can be found by following links off the development webpage at:
>
> 
><http://www.iucr.org/resources/cif/spec/cif-2-development>http://www.iucr.org/resources/cif/spec/cif-2-development
>
>I invite all participants to read this latest draft (changes 
>relative to last year's draft are marked in blue) and respond.
>
>On Tue, Jul 26, 2011 at 12:36 PM, Herbert J. Bernstein 
><<mailto:yaya@bernstein-plus-sons.com>yaya@bernstein-plus-sons.com> 
>wrote:
>
>To avoid any misunderstanding and have us all working from the same 
>base please send the proposal we are discussing as one 
>self-contained document
>  -- Herbert
>
>=====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 <tel:%2B1-631-244-3035>+1-631-244-3035
>                 <mailto:yaya@dowling.edu>yaya@dowling.edu
>=====================================================
>
>On Tue, 26 Jul 2011, James Hester wrote:
>
>Dear DDLm-ers,
>
>I agree that we should try and resolve any outstanding issues with 
>CIF2 at the Madrid
>meeting.  However, to make that approach workable we need to have 
>exposed all the issues in
>this forum so that we are not blindsided by new information in 
>Madrid and left with no time
>to process it.  I would therefore suggest that this discussion 
>continues so that we can
>identify precisely what the particular issues and alternatives are. 
>The less we have to
>deal with in Madrid the more likely we will be able to converge on a solution.
>
>So, to pursue Herbert's comments below:
>(1) In what sense is John's draft an 'either-or' approach to 
>prefixes and line folding? 
>Doesn't it require both to be recognised?
>(2) Are there any other reasons John's draft is not an improvement?
>
>James.
>
>On Wed, Jun 29, 2011 at 4:45 AM, Herbert J. Bernstein 
><<mailto:yaya@bernstein-plus-sons.com>yaya@bernstein-plus-sons.com> 
>wrote:
>      Dear Colleagues,
>
>        I fear we are not comunicating very effectively.  I am
>      _not_ comfortable with the current state of the CIF2
>      document, and I do not find the current emendation to
>      be an improvement.  Much as I would dearly love to have
>      the current line-folding protocol in CIF2, I think it
>      is much more important to work on making CIF2 into
>      something clear and coherent.  I for one find the
>      either-or approach to prefixes and line-folding unnecessary
>      and confusing.  When working with old fortran compilers,
>      I _need_ the line folding protocol.  If the prefixes
>      are bing introduced, I need a way to deal with both the
>      prefixes _and_ the line-folding protocol, not have it
>      be either-or.  I understand that mose people don't
>      see a problem, but I work with software both on new computers
>      and very, very old computers (e.g. I just brought an Indigo
>      back to life).
>
>        I repeat my suggestion that we need to meet and talk things
>      out.  Maybe then I will understand what the rest of you are trying
>      to do, and maybe I will be able to explain what I am trying
>      to do.
>
>        Regards,
>          Herbert
>
>      =====================================================
>       Herbert J. Bernstein, Professor of Computer Science
>         Dowling College, Kramer Science Center, KSC 121
>              Idle Hour Blvd, Oakdale, NY, 11769
>
>                       <tel:%2B1-631-244-3035>+1-631-244-3035
>                       <mailto:yaya@dowling.edu>yaya@dowling.edu
>      =====================================================
>
>On Tue, 28 Jun 2011, Bollinger, John C wrote:
>
>>
>>  On Tuesday, June 28, 2011 9:07 AM, Herbert J. Bernstein wrote:
>>
>>  [...]
>>
>>>    For CIF2, I would urge that we try to make something
>>>  coherent and clear.  I don't think we are there yet.
>>>  Maybe, if time permits and enough of us are present,
>>>  we can converge on something in Madrid.
>>
>>
>>  No need to wait for Madrid.  Please find below a concrete 
>>realization of my most
>preferred alternative.  I think it's coherent and clear, though 
>suggestions for
>improvement on that front are welcome.
>>
>>
>>  =====
>>
>>  1. In Change 6, section (3) of the CIF 2.0 syntax changes document:
>>
>>  a) Before the sentence "CIF2 does not specify any interpretation 
>>of the contents of
>the string," insert: "The string is interpreted according to the CIF 
>Line-Folding
>Protocol (Appendix 1) when its signature is present, and according 
>to the CIF Text
>Prefix Protocol (Appendix 2) when its signature is present.  Otherwise, "
>>
>>  b) Insert this text after "The string value is 
>>Sugar\nFlour\nButter, where \n is
>the literal newline sequence." as a new paragraph:
>>
>>  When the Line-Folding Protocol and Text Prefix Protocol are 
>>applied to the same
>value, the value shall appear in CIF as if line-folding had been 
>performed first,
>followed by prefixing.
>>
>>
>>  2. Add a definition of the Line-Folding Protocol as Appendix 1 of 
>>the CIF 2.0
>syntax changes document:
>>
>>  Appendix 1.     The CIF Line-Folding Protocol
>>
>>  The CIF line-folding protocol is a mechanism for splitting logical 
>>lines of text
>across two or more physical lines of a CIF semicolon-delimited data 
>value ("text
>field").  A version of this protocol appears among the "Common 
>Semantic Features" of
>CIF 1.1 and is in wide use in that context; in CIF 2.0 line-folding 
>is part of the
>CIF syntax, as described below.
>>
>>  The protocol applies to text fields whose contents (after 
>>interpretation of the
>text protocol, if applicable) begin with a backslash, followed by 
>any number of
>whitespace characters other than newline, followed by newline or the 
>end of the text
>field; that sequence is designated \<ws>*\n below.  There must not 
>be any whitespace
>preceding the initial backslash.
>>
>>  Given un-prefixed (Appendix 2) text field contents to which the line-folding
>protocol applies, the logical text it represents is derived from it 
>by removing each
>occurrence of \<ws>*\n, including the initial one.  Different lines may have
>different amounts of whitespace between the trailing backslash and newline.
>>
>>  Note that the line-folding protocol cannot elide the terminating 
>>\n; of a text
>field because the \n of that delimiter is not accounted part of the 
>field contents.
>  It follows from the above definition of \<ws>*\n, however, that if 
>the last line
>ends with \<ws>* then that will not appear in the unfolded value.
>>
>>
>>  3. Add a definition of the of the Text Prefix Protocol as Appendix 
>>2 of the CIF 2.0
>syntax changes document:
>>
>>  The CIF text-prefix protocol is a mechanism for formatting the 
>>logical content of a
>CIF semicolon-delimited value ("text field") so as to avoid 
>misinterpretation of
>embedded appearances of \n;. It may also be useful for improving 
>human readability of
>some CIFs.
>>
>>  The protocol applies to text fields whose physical contents begin 
>>with a prefix
>(see below), followed by one or two backslashes, optionally followed 
>by any amount of
>whitespace other than a newline, followed by a newline.  A prefix 
>consists of a
>sequence of one or more characters that are permitted in a text 
>field, except for
>backslash or newline, and does not begin with a semicolon.
>>
>>  The second and all subsequent physical lines of the contents of a 
>>prefixed text
>field must begin with the designated prefix for that field.  The 
>line containing the
>terminating semicolon is not part of the contents for this purpose. 
> The logical
>(i.e. "un-prefixed") contents of the field is derived from the 
>physical contents by
>the following procedure:
>>
>>  a) remove the prefix from each line, including the first
>>  b) if the first line starts with two backslashes then remove the 
>>first of them;
>otherwise remove the whole line
>>
>>  Example 1:
>>
>>  data_providing_example
>>  _example
>>  ;CIF>\
>>  CIF>data_example
>>  CIF>_text
>>  CIF>;This is an embedded multiline value
>>  CIF>;
>>  ; # here the field terminates.
>>
>>  Example 2:
>>
>>  data_providing_example
>>  _example
>>  ; \
>>  data_example
>>  _text
>>  ;This is an embedded multiline value
>>  ;
>>  ; # here the field terminates.
>>
>>
>>  =====
>>
>>
>>  Regards,
>>
>>  John
>>
>>  --
>>  John C. Bollinger, Ph.D.
>>  Department of Structural Biology
>>  St. Jude Children's Research Hospital
>>
>>
>>  Email Disclaimer: 
>> <http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>
>>  _______________________________________________
>>  ddlm-group mailing list
>>  <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>> 
>><http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>_______________________________________________
>ddlm-group mailing list
><mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
><http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
>--
>T <tel:%2B61%20%2802%29%209717%209907>+61 (02) 9717 9907
>F <tel:%2B61%20%2802%29%209717%203145>+61 (02) 9717 3145
>M <tel:%2B61%20%2804%29%200249%204148>+61 (04) 0249 4148
>
>
>_______________________________________________
>ddlm-group mailing list
><mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
><http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
>--
>T +61 (02) 9717 9907
>F +61 (02) 9717 3145
>M +61 (04) 0249 4148
>
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.