Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

Thanks James - that's cleared up a few things for me.

Your proposal seems fair to me - certainly I'd rather have a mechanism to allow looping than not...

Cheers

Simon



From: James Hester <jamesrhester@gmail.com>
To: SIMON WESTRIP <simonwestrip@btinternet.com>; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Saturday, June 11, 2016 2:57 PM
Subject: Re: [ddlm-group] Second proposal to allow looping of 'Set' categories

Hi Simon,


On 11 June 2016 at 21:51, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:
Actually seems I've missed the point - the underlying problem is that
although the space_group category is loopable in the core dictionary, for some reason it isnt in ddlm.
So there are two issues - allow looping of set categories and if so making sure
dependent relationships are catered for?

Yes. The space_group category is what has set this off. The original DDLm dictionary prepared by the Perth group used the old (unlooped) symmetry category and carefully sequestered space_group inside a "ref_loop" structure, which meant that you could only loop space groups by having a loop of save frames with single values of the space_group datanames inside the save frames. We (COMCIFS) rejected ref_loops when we excluded save frames from CIF2, and so in the process of merging the symmetry dictionary into the DDLm dictionary (by request of ID Brown, symCIF lead author)  I made space_group a 'Set' category as this was the only internally consistent approach due to the dependency problems that would otherwise arise.  My proposal #1 explained this background and Proposal #2 is a second attempt to thread the eye of the needle.  As a further motivator, both msCIF and the highly advanced magnetic CIF draft rely on the space_group category, so we can't just revert to the symmetry category.

 
Cheers

Simon



From: SIMON WESTRIP <simonwestrip@btinternet.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Friday, June 10, 2016 7:12 PM

Subject: Re: [ddlm-group] Second proposal to allow looping of 'Set' categories

Dear all

Assuming I understand the underlying problem - i.e. a 'parent' category may define a relational key
but some of the categories that depend upon data defined by that 'parent' category may not have an item that points to the
key (simply because of oversights when writing a particular dictionary or dictionary extension?)
would it be possible for the 'parent' to define a list of 'child' categories and also define the 'string' that should be used to
create a relational 'key' in the 'child' category, effectively dynamically defining new items for 'child' categories, though the definition of the child category won't 'know anything' about the new parent.

So if a CIF contains _atom_site.space_group_id in its coordinate loop then dictionary-aware software can recognize this as an extension referring to the space_group category key because the software already knew that this key may be applied to the atom_site category.

If a CIF reader isnt able to read the dictionary, it will be left in the same situation as now - probably just ignore the unknown item and potentially apply e.g. the wrong symmetry to the coordinates?

Or if the CIF reader is aware of the possible extension to _atom_site, but it isnt in the CIF, it is forced to assume a default
(perhaps also defined in the 'parent' definition?) - 'explicit/implicit' keys as described earlier.

Anyway, just some thoughts... (forgive me if this overlaps what has already been discussed - or if I've completely got the wrong end of the stick!)

Cheers

Simon
PS
With regard to the current situation with dictionary compliance, in my experience (mostly small-structure CIFs),
_audit_conform... data are rarely present and one uses heuristics to determine which dictionaries to load.
By 'heuristics' I basically mean knowledge of core data items obtained by visually reading the dictionary files - so I wouldn't be in favour of splitting of dictionaries into numerous modules... unless a combined version were also available.

PPS
Are there any conventions when loading dictionaries - e.g. if _audit_conform were looped and two of the dictionaries contained different definitions of the same data name, which should be applicable?





From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Friday, June 10, 2016 3:55 PM
Subject: Re: [ddlm-group] Second proposal to allow looping of 'Set' categories

Dear all,

This is not a fully-formed proposal, but more an outline and general idea for approaching the problem with Set categories and definition reuse.  If it is well received then I am prepared to expand it into a full proposal, but I suspect that the discussion will bring out considerations and nuances that will make that smoother and more successful, if indeed such a proposal is ever developed.

We previously agreed that the problem we are trying to solve arises from data dictionaries making assumptions about what kind of thing is described by a data block (or save frame).  Many data dictionaries do this implicitly; mmCIF has the distinction of doing it explicitly, via its ENTITY category.  Either way, this causes difficulty when we want to reuse a definition to describe a different kind of entity.  It would be ideal, therefore, to choose a solution that strikes at the root of this problem, and there are at least two general ways we could do this:

(1) Express the definition of "entity" applicable to a data block in that data block, by means of appropriate new data items.  Suitable choices of default values for the new items could preserve the current meaning of data files that do not present those items.

(2) Express the definition of "entity" in the relevant dictionaries, but *factor out* category and item definitions into separate dictionaries or dictionary modules.  Thus, two or more dictionaries with different senses of what an entity is -- e.g. the Core and Symmetry dictionaries -- would not either one re-use definitions provided by the other, but instead both re-use (by suitably-structured import) definitions provided by a dictionary module that itself leaves "entity" undefined.

Of those two, I am more interested in the latter.  A significant advantage of that approach is that at one important level it meets both of the seemingly-conflicting criteria that were earlier presented: *with respect to individual data dictionaries*, it does not require new data name variants to be introduced to support loopability, and it also does not require that such a dictionary permit data files that existing software is at risk of misunderstanding.  It achieves this, essentially, by limiting the scope of some aspects of category key definition to specific dictionaries.  There would thus be no conflict between, for example, SPACE_GROUP being loopable in all data files conforming to the symmetry dictionary, but not being loopable in any data file conforming to the core dictionary.

One downside would be that the composability of data dictionaries would be restricted (more).  For instance, one could not rely simultaneously on the full definitions of SPACE_GROUP items as drawn from the core and symmetry dictionaries.  Another downside would be that correct validation would be even more dependent on identifying the correct dictionary(-ies) against which to validate. I am uncertain how significant a disadvantage either of those would be, however.

Thoughts?


Regards,

John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
John.Bollinger@StJude.org
(901) 595-3166 [office]
www.stjude.org





________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group



_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group





--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.