Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of'Set' categories

Dear Herbert,

 

No one, least of all me, is arguing that the desired degree of loopability should be disallowed.  The question being discussed is how best to provide for describing it via DDLm: specifically, how to provide for adding or expanding category keys post hoc and possibly ad hoc, and how best to provide for software to deal with the results.

 

The two proposals and the several alternative ideas that have been floated all have advantages in this area, but they do not all happily coexist.  Surely it is reasonable to discuss the options -- possibly including some that we have not yet identified -- before choosing among them.  Ultimately, it is certain that whatever we choose will have disadvantages as well, even if we choose to do nothing.  It is best to understand both the advantages and the disadvantages before we choose.

 

If indeed you do not understand the opposing concerns, then it may be that you would be equally satisfied with any of the alternatives.  I wish I were in that position myself, and I hope you will indulge those who have a different view of the problem.

 

 

Best regards,

 

John

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of Herbert J. Bernstein
Sent: Sunday, June 12, 2016 6:00 AM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Second proposal to allow looping of 'Set' categories

 

Dear James,

  I agree.  I am not opposed to either of your proposals.  I am puzzled by the opposition to them.  I can see what is gained by the proposals.  I don't understand what the opposing concerns are about.

  Regards,

    Herbert

 

On Sun, Jun 12, 2016 at 2:37 AM, James Hester <jamesrhester@gmail.com> wrote:

Hi Herbert,

I would have thought that proposal #2 is exactly what you want. In exchange for reading/writing a single datatag with value (_audit.schema or whatever we call it), you are freed up to loop whatever you like without violating the standard.  The 'variant' scheme in imgCIF is a perfect match: simply including  '_audit. schema  Variant' in a datablock would create a block that is well-defined across all of CIF, and allow you to introduce 'Variant' tags for *any* loop you care about by adding an appropriate variant child key to a dictionary.  Additionally, we can reserve _audit.schema values starting with 'local' for whatever ad_hoc experiments you or others want to try, with no concerns that the files will be accidentally misinterpreted. Surely such a small imposition on readers and writers (adding a single tag read/write in software that already reads and writes many tags) is a reasonable exchange for this enhanced flexibility - as you have said, "if a small bit of extra code will help more users to ... create the CIFs they need with minimal fuss, I think it is worth having such code available".  What is your objection to this idea?

all the best,

James.

 

On 12 June 2016 at 10:41, Herbert J. Bernstein <yayahjb@gmail.com> wrote:

  Yes, I am recommending that software authors do their best to detect and creatively resolve ambiguities, rather than giving up on a data CIF.  This is similar to the recent approach to warnings in gcc and diametrically opposed to the approach to errors in gcc.  I believe we should give somewhat higher priority to serving the interests of users trying to use CIF rather than to serving the interests of software writers.  If a little bit of extra code will help more users to read existing CIFs with minimal fuss or help users to create the CIFs they need with minimal fuss, I think it is worth having such code available.

  Please don't think I am picking on CIF.  I have said very much the same things about NeXus in NIAC meetings.

 

On Sat, Jun 11, 2016 at 7:09 PM, James Hester <jamesrhester@gmail.com> wrote:

Dear Herbert and others,

I have commented below:

 

On 12 June 2016 at 06:22, Herbert J. Bernstein <yayahjb@gmail.com> wrote:

Dear Colleagues,

  I am very puzzled about the concerns being raised.  Please bear with me.

  An experimental data CIF, whether it is a core CIF, a macromolecular CIF, a powder CIF is in general a container for all the information about a particular experiment.  Within that container we may have various tags and values.  In some cases, for that experiment, there is only one instance of a value being assigned to that tag, and it is convenient to use the a tag value pair to express that assignment.  In some cases, even though we a dealing with a single instance of an experiment, in order to describe that single experiment, we need to allow for multiple values to be assigned to the same tag, and CIF provides the loop_ construct to allow us to present those multiple values with a single presentation of the name of the tag.  So if there is one author we might say

  _audit_author "Peter Pan"

  _audit_author_address "Second to the right, and straight on till morning"

 

but if there were a second author we might say

loop_
    _audit_author

    _audit_author_address
  "Peter Pan" "Second to the right, and straight on till morning"

  "Winnie the Pooh" "Hundred Aker Wood"

 

Notice that we don't make this into two separate loops to avoid creating an ambiguity about which author lives at which address.

No information would have been gained or lost if we had only one author and had chosen to use a loop instead, not would any information have been gained or lost in the case of a single author if we had presented the name and address in two separate loops.

 

In case it is not clear, we are *not* discussing mandating that key-value pairs must be presented as such instead of single-packet loops. It has already been resolved several years ago that presenting key-value pairs as single-packet loops was OK.  We are discussing e.g. which unit cell to choose when calculating bond lengths from fractional coordinates if the unit cell parameters are looped.
 

 

We have created rules about which tags belong together in the same loop, but the reality is that there are people who either do not know those rules or who choose to ignore them. I would suggest that, as long as the intended result is clear and unambiguous, it is not only unnecessary but pointlessly obstructionist to rigidiy enforce such rules.  Please like to create very flat CIFs for convenience in data harvesting.  People like to create databases spanning multiple sets of results.  People (including the IUCr publications office) like to extract particular sets of tags with their values and add "extra" tags we have not thought of.

 

If our rules are followed, we aim to guarantee that CIF readers and writers will agree on the meaning of any given file with no human intervention (after the software has been written).   We have identified the potential for ambiguity if rules in one part of the standard are relaxed. We aim to fix that.  

 

They are going to do these things whether we like it or not.  Yes it is helpful if we provide recommended best practices and warnings about practices that seem to cause difficulty, but if CIF is to be a maximally useful container, a maximally useful framework we need to focus on beings as descriptive as possible and avoid being pointlessly prescriptive.

 

The usefulness is a function of the level of agreement that is achievable between producers and consumers of CIFs. It is immediately reduced if CIF-writing software authors have to guess which variation CIF-reading software authors are likely to choose in an ambiguous situation.  We are trying to lift one of the "pointless" prescriptions (single-packet loops) in a way that is consistent and unambiguous - as maintainers of the standard we are obliged to do this instead of pretending that it won't happen or that there won't be problems if it does. 
.

 

Which is a very, very long way of saying that I think we should allow looping of tags in all cases in which the intended meaning is unambiguous, and focus the most of attention of warnings in software on specific things in specific datasets that seem likely to result in ambiguous interpretions of those specific datasets.

 

If you call that very very long have obviously been way over my allocation! You appear to be advising software authors to 'caveat emptor' and proactively check for ambiguity in datablocks.   I would prefer saving the software authors the trouble and producing a standard that made their life maximally easy.
 

 

I hope that made some sense.

Regards,

    Herbert 

 

all the best,

James.
 

 


--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

 


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group




--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

 



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.