[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

Dear Herbert and others,

I have commented below:

On 12 June 2016 at 06:22, Herbert J. Bernstein <yayahjb@gmail.com> wrote:
Dear Colleagues,

  I am very puzzled about the concerns being raised.  Please bear with me.

  An experimental data CIF, whether it is a core CIF, a macromolecular CIF, a powder CIF is in general a container for all the information about a particular experiment.  Within that container we may have various tags and values.  In some cases, for that experiment, there is only one instance of a value being assigned to that tag, and it is convenient to use the a tag value pair to express that assignment.  In some cases, even though we a dealing with a single instance of an experiment, in order to describe that single experiment, we need to allow for multiple values to be assigned to the same tag, and CIF provides the loop_ construct to allow us to present those multiple values with a single presentation of the name of the tag.  So if there is one author we might say

  _audit_author "Peter Pan"
  _audit_author_address "Second to the right, and straight on till morning"

but if there were a second author we might say

  "Peter Pan" "Second to the right, and straight on till morning"
  "Winnie the Pooh" "Hundred Aker Wood"

Notice that we don't make this into two separate loops to avoid creating an ambiguity about which author lives at which address.

No information would have been gained or lost if we had only one author and had chosen to use a loop instead, not would any information have been gained or lost in the case of a single author if we had presented the name and address in two separate loops.

In case it is not clear, we are *not* discussing mandating that key-value pairs must be presented as such instead of single-packet loops. It has already been resolved several years ago that presenting key-value pairs as single-packet loops was OK.  We are discussing e.g. which unit cell to choose when calculating bond lengths from fractional coordinates if the unit cell parameters are looped.

We have created rules about which tags belong together in the same loop, but the reality is that there are people who either do not know those rules or who choose to ignore them. I would suggest that, as long as the intended result is clear and unambiguous, it is not only unnecessary but pointlessly obstructionist to rigidiy enforce such rules.  Please like to create very flat CIFs for convenience in data harvesting.  People like to create databases spanning multiple sets of results.  People (including the IUCr publications office) like to extract particular sets of tags with their values and add "extra" tags we have not thought of.

If our rules are followed, we aim to guarantee that CIF readers and writers will agree on the meaning of any given file with no human intervention (after the software has been written).   We have identified the potential for ambiguity if rules in one part of the standard are relaxed. We aim to fix that.  

They are going to do these things whether we like it or not.  Yes it is helpful if we provide recommended best practices and warnings about practices that seem to cause difficulty, but if CIF is to be a maximally useful container, a maximally useful framework we need to focus on beings as descriptive as possible and avoid being pointlessly prescriptive.

The usefulness is a function of the level of agreement that is achievable between producers and consumers of CIFs. It is immediately reduced if CIF-writing software authors have to guess which variation CIF-reading software authors are likely to choose in an ambiguous situation.  We are trying to lift one of the "pointless" prescriptions (single-packet loops) in a way that is consistent and unambiguous - as maintainers of the standard we are obliged to do this instead of pretending that it won't happen or that there won't be problems if it does. 

Which is a very, very long way of saying that I think we should allow looping of tags in all cases in which the intended meaning is unambiguous, and focus the most of attention of warnings in software on specific things in specific datasets that seem likely to result in ambiguous interpretions of those specific datasets.

If you call that very very long have obviously been way over my allocation! You appear to be advising software authors to 'caveat emptor' and proactively check for ambiguity in datablocks.   I would prefer saving the software authors the trouble and producing a standard that made their life maximally easy.

I hope that made some sense.


all the best,

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]