Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of'Set' categories

Dear All,


I sent a previous version of this to the group yesterday, but I never received a copy back from the list server, so it seems to have been eaten.  My apologies if anyone receives both.  This version is revised with respect to yesterday’s, so if anyone does receive both then please disregard the other.




Here are what I see as the essential issues we are wrangling over with respect to the Set / Loop problem:


1. Whether we require a solution that prevents future data files from being misinterpreted by current software.


Inasmuch as proposal #2 is not such a solution, we seem to have settled on "no".  I mention it now in part to afford everyone a chance to object, and in part to observe that other solutions that were rejected or would have been rejected on the basis of leaving data files open to misinterpretation should now be open for reconsideration.



2. Whether we require a solution that allows software to easily insulate itself against future Set / Loop changes, and if so, how.


That this is a desirable characteristic seems uncontroversial, but the "how" part is not settled.


In particular, proposal #2, as I originally understood it, does not provide a complete solution to this issue.  It provides for declaring what Set categories have been or may have been presented with multiple values, but I did not interpret it to provide for defining the dimension(s) along which the values vary, and there could be more than one alternative for that.  James’s subsequent comments suggest that I have misunderstood, so this bears further discussion.


On the other hand, the existing audit_conform category offers several versions of a complete solution (or would do if changed to a Loop to match its mmCIF and the DDL1 Core analogs).  This should be unsurprising, as the problem is at minimum closely related to audit_conform’s purpose.


One specific option would be to put the extra category keys in a separate dictionary (as P2 also proposes), and for data files to be expected to specify conformance with such additional dictionaries when they in fact rely on them.  The possibly-multiple values of _audit_conform.dict_name could then be used in a manner very similar to P2’s use of _audit.schema.  Furthermore, this would provide for a reasonably agile approach to validation of data files relying on the added keys.


Alternatively, suppose we fully commit to semantic versioning (http://semver.org) of dictionaries.  An application could then test the first segment of the value of _audit_conform.dict_version to determine whether data files rely on / require a library version incompatible with the one assumed by the application.  In this case, converting one or more Set categories to allow them to take multiple values would require an increment to the affected library’s major version number.  This is not as precise as _audit.schema would be, but it follows a pattern that I think is well understood by most programmers.


I have also argued that P2 ultimately does not offer a reliable solution to this problem.  Neither, for that matter, does any use I can think of for audit_conform.  The only reliable solution is for software to affirmatively check whether its inputs conform to its expectations.  How much added weight should be attributed to solutions that offer additional, less reliable checks is a point on which it seems we are unlikely to come to consensus.



3. Whether we want to provide for Sets of items that can take multiple values, or whether we must convert Set categories to Loops to enable their items to take multiple values.


This is to some extent a philosophical difference; it is not particularly relevant to actually writing or reading data files, though it does bear on the next issue.  Having a category key is a defining characteristic of Loop categories, as evidenced by DDLm’s definitions of _definition.class, _category.key_id, and _category_key.name.  Having one value per item is a defining characteristic of Set categories.  I disfavor changing that, especially to support a use case expected to be uncommon, and I see no particular need to do so.  I would rather convert Sets to Loops, either as-needed or proactively.


James has argued that keeping current Set categories as Sets but giving them category keys where needed would make the implicit assertion that providing multiple values for the items in such categories is exceptional.  I don’t disagree with that, but I think the same assertion is implicit in defining a default value for the keys of such categories, which we would want to do whether we convert Sets to Loops or not.



4. Whether we need to change DDLm itself, or whether the needed changes can be restricted to dictionaries.


It’s not clear to me that we can resolve the issue without modifying DDLm, but I would prefer a solution that only modifies data dictionaries.









John C. Bollinger, Ph.D.

Computing and X-Ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital


(901) 595-3166 [office]



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.