[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

Dear All,

I have inserted some comments on these proposals below.

On 11 June 2016 at 00:55, Bollinger, John C <John.Bollinger@stjude.org> wrote:
Dear all,

This is not a fully-formed proposal, but more an outline and general idea for approaching the problem with Set categories and definition reuse.  If it is well received then I am prepared to expand it into a full proposal, but I suspect that the discussion will bring out considerations and nuances that will make that smoother and more successful, if indeed such a proposal is ever developed.

We previously agreed that the problem we are trying to solve arises from data dictionaries making assumptions about what kind of thing is described by a data block (or save frame).  Many data dictionaries do this implicitly; mmCIF has the distinction of doing it explicitly, via its ENTITY category.  Either way, this causes difficulty when we want to reuse a definition to describe a different kind of entity.  It would be ideal, therefore, to choose a solution that strikes at the root of this problem, and there are at least two general ways we could do this:

(1) Express the definition of "entity" applicable to a data block in that data block, by means of appropriate new data items.  Suitable choices of default values for the new items could preserve the current meaning of data files that do not present those items.

This was the intent of the '_audit.schema' proposal, if I haven't misunderstood your point here.

(2) Express the definition of "entity" in the relevant dictionaries, but *factor out* category and item definitions into separate dictionaries or dictionary modules.  Thus, two or more dictionaries with different senses of what an entity is -- e.g. the Core and Symmetry dictionaries -- would not either one re-use definitions provided by the other, but instead both re-use (by suitably-structured import) definitions provided by a dictionary module that itself leaves "entity" undefined.

Of those two, I am more interested in the latter.  A significant advantage of that approach is that at one important level it meets both of the seemingly-conflicting criteria that were earlier presented: *with respect to individual data dictionaries*, it does not require new data name variants to be introduced to support loopability, and it also does not require that such a dictionary permit data files that existing software is at risk of misunderstanding.  It achieves this, essentially, by limiting the scope of some aspects of category key definition to specific dictionaries.  There would thus be no conflict between, for example, SPACE_GROUP being loopable in all data files conforming to the symmetry dictionary, but not being loopable in any data file conforming to the core dictionary.

One downside would be that the composability of data dictionaries would be restricted (more).  For instance, one could not rely simultaneously on the full definitions of SPACE_GROUP items as drawn from the core and symmetry dictionaries.  Another downside would be that correct validation would be even more dependent on identifying the correct dictionary(-ies) against which to validate. I am uncertain how significant a disadvantage either of those would be, however.

Thoughts?

I agree that it would be elegant to compose dictionaries as you suggest, and I suspect it would be possible within the DDLm framework by judicious redefinition of the _import.get attribute.  What scheme (2) relies upon, however, is the assumption that CIF reading software pays attention to the stated dictionary conformance when reading data files, and that authors will take time to understand the implications of various combinations of dictionaries.  The sad fact is (and we can test this) that simply setting _audit.dictionary to the symmetry dictionary will have no effect on the behaviour of the vast bulk of current CIF reading software.  This is partly COMCIFS' fault for doing such a good job - we have created a system within which the meaning is not expected to vary, and COMCIFS ensures no name collisions, so why bother as a CIF reader checking which dictionary was used? 

This is why I went for a simple _audit.schema tag check, which is minimally intrusive and which has some hope of being checked if the implication is that the program could malfunction otherwise.

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]