Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

Hi Simon, answers inserted below.

On 11 June 2016 at 04:12, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:
Dear all

Assuming I understand the underlying problem - i.e. a 'parent' category may define a relational key
but some of the categories that depend upon data defined by that 'parent' category may not have an item that points to the
key (simply because of oversights when writing a particular dictionary or dictionary extension?)

The problem is that single-value datanames act as 'global' information for the whole datablock.  As soon as you decide in your wisdom to have more than one value for such a global dataname, a key to that newly-created loop is necessary in those categories that actually relied on the global information.  It is not really fair to call it an oversight - the oversight is when you turn a single-valued dataname into a loop, and don't realise the impact you're going to have on those other categories that relied on there being a single value.
would it be possible for the 'parent' to define a list of 'child' categories and also define the 'string' that should be used to
create a relational 'key' in the 'child' category, effectively dynamically defining new items for 'child' categories, though the definition of the child category won't 'know anything' about the new parent.

So if a CIF contains _atom_site.space_group_id in its coordinate loop then dictionary-aware software can recognize this as an extension referring to the space_group category key because the software already knew that this key may be applied to the atom_site category.

I think you are suggesting automatically reserving a particular value for the part of the dataname following the period, and that this name is recorded in the parent category.  DDL2 actually does something similar in the parent category.

If a CIF reader isnt able to read the dictionary, it will be left in the same situation as now - probably just ignore the unknown item and potentially apply e.g. the wrong symmetry to the coordinates?

Or if the CIF reader is aware of the possible extension to _atom_site, but it isnt in the CIF, it is forced to assume a default
(perhaps also defined in the 'parent' definition?) - 'explicit/implicit' keys as described earlier.

Anyway, just some thoughts... (forgive me if this overlaps what has already been discussed - or if I've completely got the wrong end of the stick!)

The problem, as you have noted, is that dictionary-unaware software (that would be just about all CIF applications if I'm not mistaken) are completely unmoved by anything we get up to in the dictionaries.  That is why anything we do must be predicated on changing CIF-reading software to reliably check at least one standard dataname.  So far, nothing we have done in the dictionaries has affected the interpretation of already-existing datanames, with the clear exception of symCIF.

With regard to the current situation with dictionary compliance, in my experience (mostly small-structure CIFs),
_audit_conform... data are rarely present and one uses heuristics to determine which dictionaries to load.
By 'heuristics' I basically mean knowledge of core data items obtained by visually reading the dictionary files - so I wouldn't be in favour of splitting of dictionaries into numerous modules... unless a combined version were also available.

Absolutey my point - CIF writers can't be bothered to indicate their dictionaries, because CIF readers will get it right anyway, as COMCIFS keeps the namespace unambiguous.

Are there any conventions when loading dictionaries - e.g. if _audit_conform were looped and two of the dictionaries contained different definitions of the same data name, which should be applicable?

I believe there was a dictionary merging protocol promulgated in about 2001, and which was implemented for a while in PyCIFRW.  The problem with this protocol is that it implicitly assumed that definitions might be changed by the merging. If a definition changes, software that was written based on the original definition would most likely interpret datanames based on the merged definition wrongly - although validation software might happily confirm that e.g. the new list of enumerated values included the value stated in the datablock, the software itself would have no idea what that new enumerated value might mean. Thus, when 'esd' changed to 'su' in the enumerated list of some dataname (can't remember which right now), current software would have not the faintest idea what 'su' might actually mean, as the meaning itself is a human-readable text string.  To cut a long story short, if a datablock was written according to some merged dictionary stack, the reading software would most likely have to have been written with exactly that stack in mind in order to behave correctly, and this is clearly an unreasonable expectation for the CIF writing software to place on generic unknown CIF reading software, so nobody did it.

ddlm-group mailing list

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.