Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?

In order to achieve its goal of tagging arbitrary content, this proposal would have to form part of
CIF syntax - i.e. option (1). In which case, it presents legacy issues as pointed out by David, and
I would reject it as it stands.

If it is to be slotted in according to the other options, then it becomes a semantic feature and should be
considered along with other possibilities for declaring the 'encoding' of the data value.



From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Tuesday, 28 June, 2011 8:22:34
Subject: [ddlm-group] The Grazulis eliding proposal: how to incorporate into CIF?

Dear DDLm group,

As none of you have raised any substantial objections to the Grazulis eliding proposal, I think we can consider it accepted. The question now arises as to how it will fit into the CIF framework.  I see the following possibilities:

(1) As a required protocol for all CIF semicolon-delimited text strings (must be recognised by CIF readers)
(2) As an available protocol for all CIF semicolon-delimited text strings (may not be recognised by all CIF readers)
(3) As a string type defined in DDLm for use in domain dictionary definitions (only needs to be recognised by domain-specific software)

Under option (1), the "official" value of a given semicolon-delimited string would be unambiguously that which results from decoding the protocol.  Under option (2) there would be two "official" values: the undecoded value and the decoded value, either of which would be acceptable output for a conformant parser; under option (3) the dictionary determines how to process the string (identically to interpreting e.g. LaTeX strings today).  Under option (3) the "official" value from a CIF parser would be the undecoded value, and the "official" value after application of the dictionary definition would be the decoded value.

My comments:
Option (3) has the formal effect of requiring that either the type of string delimiter is carried forward to the dictionary layer, so that triple-quote delimited strings are not inadvertently "decoded", or else that the protocol is applied uniformly across all multi-line string constructs for that particular dictionary type.

Option (2) insofar as it involves optional behaviour essentially sidelines the proposal, as CIF writers cannot count on it being understood at the reading end and so cannot use it to encode important information

Option (1) imposes extra burdens on CIF parser writers, although as Saulius notes it is not particularly difficult to implement.

My preference is either (1) or (3), perhaps inclining towards (3) in order to shift complexity to the dictionary level.  If the protocol is seen to be generally useful, it would be reasonable to prefer (1).

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.