Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Dictionary conformance (was Re: Second proposal toallow looping of 'Set' categories)

So you are in favour of making audit_conform a Loop in DDLm?

Cheers

Simon



From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Wednesday, June 15, 2016 2:30 PM
Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

Dear All,
 
Remember that audit_conform is not a DDLm category but rather a core CIF category.  I don’t see why the availability of _import.get in DDLm has any bearing on whether definitions in the DDLm core dictionary should be consistent with definitions of the same items in the mmCIF and DDL1 core dictionaries.  In the case of audit_conform, the DDLm core disagrees with the others, so there can be data files that are valid against the DDL1 core or against mmCIF that are not valid against the DDLm core.  Few CIFs actually use audit_conform, so there probably aren’t many for which such a validity mismatch occurs, but the disagreement is nevertheless undesirable and inconsistent with our intent to keep definitions stable.
 
In any case, although with DDLm you can indeed use _import.get to form a dictionary that suits you, ad hoc dictionaries formed in this manner do not benefit from well-known names or version codes.  The only thing one can do with an audit_conform entry that references an unknown dictionary is load that dictionary and validate against it.  That serves only a functional purpose, whereas with well-known dictionary names, audit_conform also serves an informational purpose.
 
 
John
 
 
From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of SIMON WESTRIP
Sent: Wednesday, June 15, 2016 7:29 AM
To: James Hester <jamesrhester@gmail.com>; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)
 
Thanks James
 
Nothing really to discuss here, except perhaps whether audit_conform is Set or Loop,
but as you point out, using the import mechanism one can create a dictionary that in turn
imports any number of dictionaries. So whereas many ddl1 CIFs actually conform to cif_core
and the iucr and/or ccdc 'local' dictionaries, which are rarely (never) declared in the CIF instance,
moving forward we might look at creating a dictionary that pulls these in as well as cif_core.
On the other hand, perhaps it would be cleaner to declare such dictionaries separately in an audit_conform loop
so that readers can fetch them if they're really interested, rather than fetching an unfamiliar dictionary only to find out its basically cif_core but with a bunch of extra items they are not interested in anyway.
I'm inclined toward the latter approach.
 
 
Anyway, thanks again for your clarifications - all very useful
 
Cheers
 
Simon
PS I've added a couple of trivial comments below...
 

From: James Hester <jamesrhester@gmail.com>
To: SIMON WESTRIP <simonwestrip@btinternet.com>; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Wednesday, June 15, 2016 1:45 AM
Subject: Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)
 
Hi Simon,
I'll have a go at answering your questions.  This is an interesting line of questions, but a bit to the side of the other discussion, so I've put them into a different thread.
 
On 15 June 2016 at 09:16, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:

Dear John et al.
 
Struggling to keep-up again - so going back to basics:

 
1) I assume the scope of the proposed 'schema' definition is restricted to the dictionary in which its definition appears
(i.e. directs to alternative definitions that are already in that particular dictionary)?
 
The scope would be everything that COMCIFS managed. To anticipate your question below, we could/should have a very short 'IUCr core' dictionary with datanames that apply regardless of the particular domain, and which all DDLm dictionaries import.  Probably cif_core would import it, and all others would import cif_core.
 
SPW: that sounds logical
 
2) Is the intention that all CIFs (whatever domain) conform to a 'global' ddlm dictionary?
 
Excuse my pedantry here, but I want to make sure we understand each other. DDLm is a set of attributes which you can use to define the meaning of datanames.  A CIF (i.e. datafile) does not conform to a DDLm dictionary, it conforms to a dictionary. That dictionary is written in a DDL.  So we can write a dataname, and it has the same meaning (perhaps after application of aliases) regardless of the particular DDL in which that meaning is described.  If I have misunderstood you, and you simply meant, "is there a dictionary that all datafiles should draw on?", then the current answer is "no".  For example, mmCIF datanames completely replace datanames in cif_core.  The intention of proposal #2 is that _audit.schema would be a universal dataname, ideally defined in the separate 'IUCr core' dictionary described above.
 
SPW: "is there a dictionary that all datafiles should draw on?" - yes thats what I meant
 
3) How does one declare dictionary conformance in a CIF instance without using a dictionary-defined dataname?
 
You can't.  A fundamental assumption, that COMCIFS attempts to fulfill, is that the meaning of datanames does not change.  In particular, the meaning of the dictionary conformance datanames does not change, so the programmer can hard-code a dataname to output dictionary conformance and not worry that somehow in the future this dataname will have a different meaning.  Hopefully this underlines that the *primary* audience of our dictionaries is the human software programmer, *not* the software itself. The programmer (*not* the program)  is the one that has to read the text definition to work out e.g. what dataname contains the atomic positions. The "dictionary driven software" part can only relate to those things that software can understand and use and which *do not* relate to a change in meaning (because we're not supposed to change the meaning) - aliases and dREL come to mind.  All of the other machine-readable stuff can only be used for validation, which is why I assert that a lot of casual CIF-reading software doesn't bother with dictionaries at execution time - the information available at software creation time is guaranteed to be sufficient to 'get the job done' now and in future.
Basically I am certain that I am not alone in having to rely on heuristics based on prior knowledge just to identify that e.g. my molecular graphics program is dealing with a ddl2 CIF (mmCIF) rather than a ddl1 CIF, and whether its a pdCIF, msCIF, rhoCIF... is ddlm/CIF2 going to help at this rather fundamental level?
 
No, it will not.  What you need is for CIF authors to include the already-defined datanames specifying dictionary conformance.  The _audit.schema proposal will also not help. 
I suggest that each time (or once a day, to stay sane) that you find a file that does not contain the dataname _audit_conform_dict_name (old cif_core) or _audit_conform.dict_name (mmCIF, and in draft DDLm core dictionary), you send a gentle message to the software author (if you can identify them), asking them to include one of these datanames in the output template that they distribute with their software (or adjust their code).  You could even provide a line in the email for them to cut and paste.  Perhaps you can get checkCIF to issue a level C alert if they are missing (if it doesn't already) with a suggestion to include a particular line in the CIF file to make the alert go away.  Perhaps we can wait until DDLm cif_core is accepted before pushing this.
 
SPW: I think I'd better make sure my own CIF writing software starts doing a more robust job in this respect before asking others to! The most I do is look for the audit_conform items - I don't add them if not there.
 
On a side note, DDLm includes the dictionary import formalism, which means that there is only one 'master' dictionary that imports all the rest (so e.g. pdCIF would internally import 'cif_core'). This improves on the old dictionary merging formalism I discussed before, and is why audit_conform is a 'Set' category in DDLm but a multi-packet loop category in DDL2.  The 'Set' designation for _audit_conform may be worth revisiting.
 
 
 
 
 

 



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
 


Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.