Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Discussion of hub-spoke proposal

Dear James and all,

I believe you have caught my vision for H&S.  If any differences remain then they are probably within the range of reasonable variations on the idea.  In particular, your worked example seems right to me.  As for your observations:

> (1) If there is to be more than one 'Hub' category, simply searching for a 'Hub' dataname is not a substitute for _audit.schema, as there could potentially be various different such 'Hub' datanames unknown at software creation time.  I suggest solving this using _audit.schema.

I don't object to providing _audit.schema as a means to designate some or all of the hub categories in use, and especially to designate hub categories in use for which child keys appearing explicitly in the data file may not be expected by some software.

> (2) The 'datablock' category described above is invariant: it can always take the default value for the parent key, and child keys are present in all categories, with default values throughout.  It essentially can be left out of all datablocks. It therefore makes sense to me to therefore leave the 'Set' category defined as a reminder (perhaps renamed to 'Global'), on the understanding that the actual behaviour is described by a default-valued Hub category.

There is a bit of a data modelling question here.  Personally, I'm inclined to *not* use a hub that is essentially a singleton identified specifically with the overall data block.  I have suggested that we instead characterize the rather broad and floppy entity currently representable via the definitions in the DDLm core dictionary as the initial hub.  In the context of _audit.schema you suggested "structural" as a designation for this; following that idea but choosing a noun instead, I would nominate "structure" for that category.  This is not just a naming question, however -- in the future, some data blocks may contain multiple values of this category.

It is perfectly fine that that hub and the child keys associated with it are not explicitly expressed in any existing data file -- the meanings of those data files do not change.  Going forward, I do not doubt that many additional data files will be written that way, and that's fine, too.  But it would be unwise to suppose that we will never have a use for providing multiple "structures" in the same data block, and that sense it is not safe to say that the hub and associated child keys can be left out of *all* future data blocks.

Therefore, although there are a few categories that I think should indeed remain Sets, they are few.  I think Set categories are more important for DDLm's own internal use and of little practical import for data dictionaries. In truth, however, I am tiring of this particular dimension of the debate.  It is more important to me that categories that need them have the appropriate keys defined.  The "Set" designation makes sense to me only to affirmatively express that a category has a zero-attribute key, but beyond that, I have nothing further to say about the matter.

> (3) The 'twin' hub category is the same as the twin_individual category introduced in the twinning dictionary; in that case, we had to create a copy of the refln category with a different name as we lacked the mechanism being discussed here. Likewise, 'variant' in imgCIF works precisely as described here.  This is a sign that we are on the right track, I think.

Yes, it is a good sign that H&S solves problems that we have run into before, and that it is similar to or the same as solutions we have used before.

> I have no particular observations regarding dictionaries, beyond the very nice composability offered by this proposal.
> Transformations between 'schema'
> ============================
> A pleasant property of proposal #2 was the guaranteed ability to mechanically transform a datablock between schema, in particular, to create datablocks that conformed to the default schema.  For
H&S, this is also possible by emitting a datablock for each value of each Hub category key.


> dREL
> ====
> I repeat the desirability of setting things up so that dREL methods do not need editing when a new 'Hub' child key appears in a Loop category.

I accept this.

>  This is possible with the following rules:
> (1) Any accesses within a dREL method to a value in a different category are taken to refer to the packet that matches the complete set of common sibling keys (i.e. including hub keys)

And if the complete set of common keys does not form the referenced category's category key, then such a reference is in principle multi-valued.  If the method assumes otherwise then this is the primary way in which adding a hub key to a category could produce a need to edit the method.  Under this rule, however, the reference remains single-valued if the referenced category gets its own new key referencing the same hub.

I note also that I am assuming that when a multi-step relationship chain is explicitly traversed, this rule would be applied on a per-step basis.

> (2) Where a category contains keys that are themselves siblings, the dREL method must explicitly state the values of those keys when accessing other categories (see GEOM_ANGLE)

Yes.  And dREL already has a means of doing that, right?

> (3) A dREL method may never be rewritten because of the addition of a new key to the category. Instead, a new dataname should be defined.
> Regarding (3), if we perform decomposition of a datablock into the 'default' schema, then this will mean all datanames have their usual interpretation and will have the exact relationship defined in the dREL.

This one makes me a bit nervous, but it should be ok.  The point of the proposal is that all categories should have the appropriate relationships defined between them and always visible, and one of the key ideas behind dREL is that it traverses relationships automatically.  If we change one or both of a pair of related categories in a way that alters the nature of their relationship then we will thereby have made a change that violates our commitment to keeping definitions stable.  I am satisfied to agree not to do that.

> At first glance, it appears that 'Hub' categories are in no way special from a DDLm/dREL point of view and do not need a special designation.

I agree, and that is a desirable property for hub categories to have.

> I inserted a few further comments regarding dREL below.  Once we have tidied up any loose ends, I suggest we settle on a proposal that essentially boils down to defining _audit.schema appropriately.
> [...]

It sounds like we are coming together, but don't we also need to propose dictionary changes to implement H&S, at least in the DDLm core?


John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.