Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Further discussion of proposal #2

Hi James
A few points in response to your message before lapsing back into quiet listening mode:
 > (2) ... The proliferation of child keys is exactly what I'd > like to avoid - e.g. do we really need keys in atom_site > pointing to cell and space group?
Sure; but the point of the "lazy" approach is that you onlyexplicitly specify new keys for real-world use cases - a sortof organic complexification in grudging acknowledgement of need.In a way, that's what is involved when you need to add newentries to your _audit.schema catalogue. Perhaps the difference- and the benefit of _audit.schema - is that you partitionoff the added complexity in a more modular way. I have thesense that John's "hub" and "star" brings the same benefit.
 > (3) ... The _audit.schema proposal ... is for software to > behave as follows:
Thank you for the amplification. I'm always worried aboutprescribing how future software should behave, but I guessit is in principle possible to provide a standalone program or web service that takes a data file with a specified _audit.schema and converts it to a series of data files (or blocks) each of which contains an unlooped instance of Set categories according to the default schema. So you could throw the derived file at checkCIF(as it is now) and at least test the chemical reasonableness of those data blocks which turned out to describe an individual atomic or molecular crystal lattice, although you would have lost a sense of context (e.g. that these lattices occur as overlapping phases in a material). This might ease the pain of growing complexity (orcomplications!) of the data model for software authors focussedon a very particular set of applications.
 > (4) The issue is that SPACE_GROUP is used in practice as a Set (it'd > be good if you could check IUCr archives for statistics on this)
Sigh. You're right. At a quick check of the 152 CIFs in productionfor IUCrData, 99 include '_space_group_name_H-M_alt', 0 include '_space_group_id'. Indeed, the supplementary CIFs we redistribute from the IUCr journals site all omit '_space_group_id'. As these are generated on the fly, we could change that, but already many will have escaped into the wild.
On 21/06/2016 06:39, James Hester wrote:> Dear Brian and colleagues,>> On 21 June 2016 at 03:03, Brian McMahon <bm@iucr.org> <mailto:bm@iucr.org>> wrote:>>>     (2) It seems to me that a formal approach to the distinction might>     be to define a Set as a category of data items that - if looped>     without an explicit key - assume a default value ('') of the>     category key item.>     This obliges dictionary writers to specify a data name that plays>     the role of a formal key for *every* category, but it does not>     require data files to carry instances of every such key data name.>     [Or maybe it's a>     little more forgiving than that: "lazy" dictionary writers only need to>     specify key data names when real use cases demand looping of what>     had been expected to be single-value values; but then it is incumbent on>     them to stir out of their laziness and ensure that all consequent>     child key relationships are consistent across the new use cases that>     have arisen.]>>> The proliferation of child keys is exactly what I'd like to avoid - e.g.> do we really need keys in atom_site pointing to cell and space group?> As I've said before, the real point of 'Set' categories is to provide a> simplification for dictionary writers and software authors. I would like> to keep the simplicity, but provide a route to gracefully remove the> simplification.  So, any option that makes the 'Set' category scenario> more difficult to work with (e.g. extra key definitions, extra child> keys, extra things for a dictionary writer or software author to do)> goes against the whole reason for having 'Set' categories, which is why> my proposal #2 moved all the additional complexity into add-on> dictionaries that only require engagement for non-default _audit.schema.>>>>     (3) The _audit.schema proposal has its attractions, though I'm not sure>     how it works in practice. I mean, suppose I define an "INCOMMENSURATE">     schema to indicate that multiple space groups describe multiple>     discernible symmetries in a real atomic (quasi-)lattice, and a "TABLES">     schema to indicate that this is just a list of symmetry operations in>     all the distinct space groups. It could be useful for validation>     purposes to know that "INCOMMENSURATE" also requires additional>     information/relations between other categories (e.g. are there different>     origins or orientation matrices associated with each space group?).>     Is _audit.schema necessary and sufficient to capture these>     additional requirements? If not, could it be made so? I think this>     is moving into the sort of thing that Simon is interested in - can>     we elegantly define application profiles that say "this is a>     single-crystal untwinned structure", "this is an incommensurate>     powder structure with twinning", "this is a structure refinement>     with its own database of neutron absorption coefficients"?>>> The _audit.schema proposal as it currently stands (although I'm waiting> for positive responses to this) is for software to behave as follows:> (1) Always check _audit.schema. If absent or default value, software may> interpret datanames in the datablock according to those definitions> found in dictionaries associated with the default schema with no further> runtime checks of dictionaries necessary.  This rule captures the> mainstream path, under which 'Set' categories simplify our life.> (2) If _audit.schema is not default, software is only guaranteed to> correctly interpret datanames if it can handle the 'Set' category child> key list provided in the versions of dictionaries given in> _audit_conform. This is the path that reveals the complexity hidden> behind the 'Set' behaviour. The _audit_conform check is lifted by> condition (4) below.>> Consideration of your example leads me to suggest the following> supporting rules:>> (3) All dictionaries must indicate which _audit.schema they are> associated with (a new dictionary-level DDLm tag)> (4) All dictionaries must define child keys of their looped 'Set'> categories for all relevant looped categories that they import from> other dictionaries. This reduces the reliance on _audit_conform in (2),> and note that all dictionaries will import cif_core if they loop any> 'Set' categories from cif_core.> (5) All definitions appear in one dictionary only (this is probably> already a rule).>> Now let's flesh out your example: suppose that INCOMMENSURATE and TABLES> both belong to the same _audit.schema as they both involve only looping> space_group.  We have a TABLES dictionary which, by looping> 'space_group', is required by rule (4) to add child_keys to all core_cif> categories for which space_group is relevant. Our INCOMMENSURATE> dictionary is forced to do the same when it loops 'space_group' (by rule> (5), in practice INCOMMENSURATE would import TABLES).  Therefore,> software which is written expecting the 'space group looped' schema will> not misinterpret datablocks based on either dictionary.  Of course, it> will also *not* be able to distinguish the INCOMMENSURATE and TABLES> cases using _audit.schema, although note that the space group tabulation> software will be able to correctly extract space group tabulation> material from an incommensurate file - this is the behaviour that we> enable with the _audit.schema idea.>> A more complex example: suppose that Herbert subsequently comes along> with his 'Variant' schema.  An dictionary corresponding to [Variant]> would define a dictionary with 'variant' child keys for almost all> cif_core looped categories. An dictionary corresponding to [Variant> Space_group] would additionally provide a variant key to the space_group> category and space_group keys as before to all core_cif definitions.> So, what happens to looped categories defined in the INCOMMENSURATE> dictionary but not present in the [Variant Space_group] dictionary?> Well, the incommensurate dictionary conforms to the [space_group]> schema, so is not compatible with the [Variant space_group] schema.  A> *further* dictionary must be defined which imports the INCOMMENSURATE> and Variant dictionaries and adds Variant child keys to all the> incommensurate loop categories.>> As a side note, programs written for schema [a b c] automatically handle> all combinations of a, b and c, and software that additionally/instead> examines the dictionaries provided in _audit_conform can provide> universal schema compatibility for all non-key datanames that it was> programmed to expect.  For this reason I believe that dREL methods can> be made schema-independent.>> Anyway, end up with many dictionaries, each corresponding to a> combination of schema and full of child keys and perhaps a few original> categories.  There is perhaps scope for us therefore to define a virtual> dictionary creation protocol where e.g. the dictionary header just lists> all of the imported looped categories that require a child key and the> key names are generated automatically. I would prefer to leave this sort> of discussion to a later date, if and when dictionary proliferation> becomes a problem.>>>     (4) Probably an obtuse question, but is it possible to retain in the>     DDLm version of the core a SYMMETRY category that is a Set, and a>     separate SPACE_GROUP category that is a Loop? Hardly elegant, but a>     way of owning up to the historical mistake? Then the relationship>     between the different datanames would not be through the alias>     mechanism, but rather by some dREL transformation?>>> The issue is that SPACE_GROUP is used in practice as a Set (it'd be good> if you could check IUCr archives for statistics on this) and is> therefore assumed to provide global information, even though such> behaviour for a Loop category is nowhere specified (yet).  On the other> hand, if space_group *is* looped in a datafile a whole host of> categories become ambiguous.  While this objective loss of meaning> *might* be enough to stop users attempting to mix looped space groups> and e.g. atom_site lists, as a standards body we have to specify how to> handle such cases or better still make sure it never happens. As> additional fuel to the fire, magCIF extends SPACE_GROUP based on> long-entrenched code in the magnetic community (assuming Set behaviour)> *and* there have been requests from other quarters to preserve> SPACE_GROUP loopability.  Believe me, I'd much rather do as you suggest> but that would be simply ignoring the definitional problems that we have.>>>     (5) So I've not commented specifically on the 'Global' proposal>     below. As I understand it, the change in name is designed to make>     clearer the>     circumstances in which, as it were, you want to force a category not>     to loop its values. If 'globality' is indeed the only reason that>     you would enforce such a constraint, and if that helps programmers>     to understand what's going on, I'd be in favour of it; but I want to>     think some more about it before committing myself to that first opinion!>>     Brian>>>>> --> T +61 (02) 9717 9907> F +61 (02) 9717 3145> M +61 (04) 0249 4148>>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.