Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] =?utf-8?q?=28no_subject=29?=

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: [ddlm-group] =?utf-8?q?=28no_subject=29?=
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Thu, 16 Jun 2016 09:39:29 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=pYAA67RNo33I4610hlPw4rMotEZ/WzCO/w7uxAU3Rwk=;b=av86i7oUSM4YWjsQNpedid4UHYPi0th4jIMvCQVYIoQmgs/UW1holUBqKwofE8RHJECkMT9N+VlBwxX+A4OFkIpkKMS0v/Clsf6HqxeMgVMGsxgeRGkNyQAxNqkrWlRNXabdUtFo2bWVL04qSj9DcwPm+AXp2ivTSk2FtRVAbDWUn3T+eoshCY2rP8foyl0e+d+ubmZj7EL8SULBUq2GM1hHhk4zDoCx1hs2WLdGhQdNnws1QKCm1xWpINpwJXLL3Z0NdtTQwKQUexZHVgO04LKgjPQ8Tbi8Q+HaeAh5Jbm6fPnKExtRkkwbWkDbodFRnKhxEX2gMCn5Voz6oBazjQ==
Dear John and Group: see inserted comments, and an additional comment of my own first:

John's points have underlined that, to work effectively, the _audit.schema proposal would require that any CIF-reading software using a non-default value for _audit.schema needs to parse and check the dictionaries provided in _audit_conform in order to understand which categories have been provided with a new key, as this information could change with dictionary version.  Dictionary parsing and analysis is a significant extra burden which I have argued is unreasonable in the case of the default _audit.schema value. I am not as concerned with non-default, non-mainstream situations, as long as the system can be made to work.

On 15 June 2016 at 07:12, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear All,


Here are what I see as the essential issues we are wrangling over with respect to the Set / Loop problem:


1. Whether we require a solution that prevents future data files from being misinterpreted by current software.


Inasmuch as proposal #2 is not such a solution, we seem to have settled on "no".

I think we agree that meeting this requirement is impossible given other constraints, but we need to minimise the probability of this happening.



2. Whether we require a solution that allows software to insulate itself against future Set / Loop changes, and if so, how.


That this is a desirable characteristic seems uncontroversial, but the "how" part is not settled.


In particular, proposal #2 does not provide a complete solution to this issue.  It provides for declaring what Set categories have been or may have been presented with multiple values, but it does not provide for defining the dimension(s) along which the values vary, and there could be more than one alternative for that.  The existing audit_conform category does offer a complete solution (or would do if changed to a Loop to match its mmCIF and the DDL1 Core analogs), but it is not as precise as proposal #2’s _audit.schema would be.

I disagree that proposal #2 is not complete (see previous email). Proposal #2 is indeed more precise, and this is important. Many dictionaries do not alter 'Set' categories. So software written with cif_core in mind could actually handle datafiles written in accordance with pd_CIF, ms_CIF, and the future magCIF dictionaries just fine.  Should this software reject a perfectly good crystal structure found in a file that conforms to pdCIF?  No. What about some future dictionary that just adds some infra-red measurements to the structure?  Again, probably not, but you can't at software creation time specify that your software will accept this dictionary, because it doesn't yet exist. Thus audit_conform.dictionary is not workable because the software author, at creation time, cannot know how future (or local) dictionaries will or will not fiddle with Set categories.  Of course, the author could write software (or find a library) that actually downloaded and analysed the dictionaries given in _audit_conform, but hopefully the examples in my previous email or common sense would suggest that the bulk of CIF reading authors will never program a complete dictionary parser and analyse the key and loop structure just to get a few atom sites.

In contrast, under the _audit.schema proposal, if audit.schema is used with default value, any present or future dictionary that did not fiddle with 'Set' categories would be automatically acceptable.  The audit.schema system requires some extra work in non-default cases: software cannot trivially determine that an unknown (at software creation time) value of audit.schema was acceptable for some given datablock contents and use cases. This case can be dealt with if the software is prepared to parse and analyse the dictionaries provided in _audit_conform, or if the software is prepared to run a generic conversion utility to transform to its own schema.



3. Whether we want to provide for Sets of items that can take multiple values, or whether we must convert Set categories to Loops to enable their items to take multiple values.


This is to some extent a philosophical difference; it is not particularly relevant to actually writing or reading data files, though it does bear on the next issue.  Having a category key is a defining characteristic of Loop categories, as evidenced by DDLm’s definitions of _definition.class, _category.key_id, and _category_key.name.  Having at most one value per item is a defining characteristic of Set categories.  I disfavor changing that, especially to support a use case expected to be uncommon, and I see no particular need to do so.  I would rather convert Sets to Loops, either as-needed or proactively.


James has argued that keeping current Set categories as Sets but giving them category keys where needed would make the implicit assertion that that providing multiple values for the items in such categories is exceptional.  I don’t disagree with that, but I think the same assertion is implicit in defining a default value for the keys of such categories, which presumably we would want to do whether we convert Sets to Loops or not.  Moreover, making assumptions about what is or is not normal or expected is exactly how we got into this situation.  If we are going to double down on that, then I think we need to first formulate a clearer strategy on when and how to make such assumptions.

I don't actually think default values for keys are necessary until multiple-packet loops are set up. Also, I would need to see a clear formulation of how you would propose to convert Set to Loop in order to comment sensibly in light of all of the other constraints we are operating under. I do appreciate that I am favouring a particular application by giving 'Set' categories such significance. Apart from being bound to do this to keep compatibility with legacy applications, there are non-trivial efficiencies available by being able to make certain values 'global', and the DDL1, DDL2 entry.id, and 'Set' behaviour provides a neat way to do this.



4. Whether we need to change DDLm itself, or whether the needed changes can be restricted to dictionaries.


It’s not clear to me that we can resolve the issue without modifying DDLm, but I would prefer to avoid modifying it if that is possible.


I think restricting the DDLm changes to the definition of 'Set' is about as light a touch as we can do.  The dREL impact is non-trivial: as I have argued, adding a key to a category strictly speaking changes the meaning of the non-key datanames, which would entail a change in dREL for each of those datanames. A dictionary associated with a non-default _audit.schema value would therefore have to redefine dREL methods, or else remove them, for every non-key dataname.  One option I can look into if we continue down this path is that references to other categories in a dREL method automatically use all keys in the present category that share a parent with keys in the external category when accessing items in the external category.  This would nicely declutter dREL and remove the need for redefinition in many cases.


5. Whether all category attributes need to be global


In particular, I raised the possibility that some category attributes, especially keys and therefore the nature of some of the relationships among categories, could be specified on a per-dictionary basis instead of globally.  This approach promotes dictionaries over individual definitions as the vehicle for addressing the problem, and it’s not so far away from proposal #2 in that prop2 also involves multiple dictionaries in providing full definitions for each category.  The main difference is that under prop2 there is (only) a single aggregate definition for each category, and that definition contains all possible keys, whereas per-dictionary category relationships allow for a subset appropriate to the data domain to be selected and used, simply by choice of dictionary.


This would presumably require a much higher level of audit_conform.dictionary observance than we presently have, and even then the problem I've identified above remains.  So while I agree that the information that is needed would be right there in the dictionaries, I don't think accessing that information is trivial enough for your average CIF reading software to be expected to do it just to grab a few atom sites.  Thus I reiterate that _audit.schema has considerable practical advantages for mainstream use.

all the best,

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.