Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .

On Tuesday, June 28, 2011 9:07 AM, Herbert J. Bernstein wrote:


>   For CIF2, I would urge that we try to make something
>coherent and clear.  I don't think we are there yet.
>Maybe, if time permits and enough of us are present,
>we can converge on something in Madrid.

No need to wait for Madrid.  Please find below a concrete realization of my most preferred alternative.  I think it's coherent and clear, though suggestions for improvement on that front are welcome.


1. In Change 6, section (3) of the CIF 2.0 syntax changes document:

a) Before the sentence "CIF2 does not specify any interpretation of the contents of the string," insert: "The string is interpreted according to the CIF Line-Folding Protocol (Appendix 1) when its signature is present, and according to the CIF Text Prefix Protocol (Appendix 2) when its signature is present.  Otherwise, "

b) Insert this text after "The string value is Sugar\nFlour\nButter, where \n is the literal newline sequence." as a new paragraph:

When the Line-Folding Protocol and Text Prefix Protocol are applied to the same value, the value shall appear in CIF as if line-folding had been performed first, followed by prefixing.

2. Add a definition of the Line-Folding Protocol as Appendix 1 of the CIF 2.0 syntax changes document:

Appendix 1.     The CIF Line-Folding Protocol

The CIF line-folding protocol is a mechanism for splitting logical lines of text across two or more physical lines of a CIF semicolon-delimited data value ("text field").  A version of this protocol appears among the "Common Semantic Features" of CIF 1.1 and is in wide use in that context; in CIF 2.0 line-folding is part of the CIF syntax, as described below.

The protocol applies to text fields whose contents (after interpretation of the text protocol, if applicable) begin with a backslash, followed by any number of whitespace characters other than newline, followed by newline or the end of the text field; that sequence is designated \<ws>*\n below.  There must not be any whitespace preceding the initial backslash.

Given un-prefixed (Appendix 2) text field contents to which the line-folding protocol applies, the logical text it represents is derived from it by removing each occurrence of \<ws>*\n, including the initial one.  Different lines may have different amounts of whitespace between the trailing backslash and newline.

Note that the line-folding protocol cannot elide the terminating \n; of a text field because the \n of that delimiter is not accounted part of the field contents.  It follows from the above definition of \<ws>*\n, however, that if the last line ends with \<ws>* then that will not appear in the unfolded value.

3. Add a definition of the of the Text Prefix Protocol as Appendix 2 of the CIF 2.0 syntax changes document:

The CIF text-prefix protocol is a mechanism for formatting the logical content of a CIF semicolon-delimited value ("text field") so as to avoid misinterpretation of embedded appearances of \n;. It may also be useful for improving human readability of some CIFs.

The protocol applies to text fields whose physical contents begin with a prefix (see below), followed by one or two backslashes, optionally followed by any amount of whitespace other than a newline, followed by a newline.  A prefix consists of a sequence of one or more characters that are permitted in a text field, except for backslash or newline, and does not begin with a semicolon.

The second and all subsequent physical lines of the contents of a prefixed text field must begin with the designated prefix for that field.  The line containing the terminating semicolon is not part of the contents for this purpose.  The logical (i.e. "un-prefixed") contents of the field is derived from the physical contents by the following procedure:

a) remove the prefix from each line, including the first
b) if the first line starts with two backslashes then remove the first of them; otherwise remove the whole line

Example 1:

CIF>;This is an embedded multiline value
; # here the field terminates.

Example 2:

; \
 ;This is an embedded multiline value
; # here the field terminates.




John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.