[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Dictionary conformance (was Re: Second proposal toallow looping of 'Set' categories)

 

Dear James and Colleagues,

 

I agree that we should not require data files to replicate in audit_conform the imports performed by the dictionaries with which they specify direct conformance.  Thus, I agree that in the example presented, it is sufficient to specify conformance with just magcif and pdcif, omitting core_cif, etc..  I see no reason to forbid data files from specifying additional dictionary conformance, however, such as if the example also declared direct conformance with the core.  I don’t think such a prohibition is intended, but my agreement is predicated on that understanding.

 

Complications may arise if audit_conform rows designate conformance with dictionary versions (via _audit_conform.dict_version) that rely on different versions of the same dictionary, or if one directly designates conformance with a different version of a dictionary (maybe cif_core) than other dictionaries it claims conformance with rely upon.  I am prepared to ignore that issue, however, on the basis that a CIF presenting such conformance assertions is inconsistent.  In such a case there is anyway a reasonable chance that the import semantics described would yield a combined dictionary suitable for validating the data file.

 

I also observe that when a data file documents conformance with a specific dictionary version, that seems to require the corresponding notional _import.get element to identify the target dictionary by a URI referencing the specific version designated.  That information could be drawn from _audit_conform.dict_location, if that is provided, but my understanding is that there is (or was or should have been) an online registry maintained by IUCr for identifying dictionary locations from name and version (see http://www.iucr.org/__data/iucr/lists/cif-developers/msg00044.html, section 2).  I do not recommend that reliance on such a registry be explicitly written into the DDLm definition of audit_conform; rather, I suggest that the means by which the needed dictionary is identified be left unspecified, as indeed it is in the DDL1 core.

 

 

Regards,

 

John

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Wednesday, June 15, 2016 7:19 PM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

 

I agree that audit_conform must be looped. In order to write the definition we must answer some questions about the semantic interpretation of multiple dictionaries.  To make this concrete, let's suppose that we have a powder diffraction result that reports a magnetic structure.  This datablock will have items from pd_cif and mag_cif.  Both pd_cif and mag_cif internally import core_cif, as well as templ_enum and templ_attr.

Which of these dictionaries appear in the audit_conform loop?  I think it should be pd_cif and mag_cif, and that the semantic interpretation is as if there was a virtual dictionary with an import.get in 'Full' mode of the listed dictionaries into its HEAD category.  For our example, it would be as if the following lines were present in the virtual dictionary HEAD, where the "file" entries point to the precise URL given in _audit_conform:

_import.get [{"file":magcif.dic "save":MAGCIF "mode":Full "dupl":Ignore "miss":Exit}
             {"file":pdcif.dic  "save":PDCIF  "mode":Full "dupl":Ignore "miss":Exit}]               

 

Note that we must have "dupl":Ignore (or Replace) in order to account for the fact that core_cif definitions will be notionally present in both dictionaries.

Does this sound reasonable?

James.

 

 

On 16 June 2016 at 01:03, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Yes.  It is a bug in the DDLm version of the core dictionary that its definition of the audit_conform category is inconsistent with mmCIF and the DDL1 core.  That bug should be fixed.

 

Regards,

 

John

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of SIMON WESTRIP
Sent: Wednesday, June 15, 2016 9:01 AM
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>

Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

 

So you are in favour of making audit_conform a Loop in DDLm?

 

Cheers

 

Simon

 


From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Wednesday, June 15, 2016 2:30 PM
Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

 

Dear All,

 

Remember that audit_conform is not a DDLm category but rather a core CIF category.  I don’t see why the availability of _import.get in DDLm has any bearing on whether definitions in the DDLm core dictionary should be consistent with definitions of the same items in the mmCIF and DDL1 core dictionaries.  In the case of audit_conform, the DDLm core disagrees with the others, so there can be data files that are valid against the DDL1 core or against mmCIF that are not valid against the DDLm core.  Few CIFs actually use audit_conform, so there probably aren’t many for which such a validity mismatch occurs, but the disagreement is nevertheless undesirable and inconsistent with our intent to keep definitions stable.

 

In any case, although with DDLm you can indeed use _import.get to form a dictionary that suits you, ad hoc dictionaries formed in this manner do not benefit from well-known names or version codes.  The only thing one can do with an audit_conform entry that references an unknown dictionary is load that dictionary and validate against it.  That serves only a functional purpose, whereas with well-known dictionary names, audit_conform also serves an informational purpose.

 

 

John

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of SIMON WESTRIP
Sent: Wednesday, June 15, 2016 7:29 AM
To: James Hester <jamesrhester@gmail.com>; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

 

Thanks James

 

Nothing really to discuss here, except perhaps whether audit_conform is Set or Loop,

but as you point out, using the import mechanism one can create a dictionary that in turn

imports any number of dictionaries. So whereas many ddl1 CIFs actually conform to cif_core

and the iucr and/or ccdc 'local' dictionaries, which are rarely (never) declared in the CIF instance,

moving forward we might look at creating a dictionary that pulls these in as well as cif_core.

On the other hand, perhaps it would be cleaner to declare such dictionaries separately in an audit_conform loop

so that readers can fetch them if they're really interested, rather than fetching an unfamiliar dictionary only to find out its basically cif_core but with a bunch of extra items they are not interested in anyway.

I'm inclined toward the latter approach.

 

 

Anyway, thanks again for your clarifications - all very useful

 

Cheers

 

Simon

PS I've added a couple of trivial comments below...

 


From: James Hester <jamesrhester@gmail.com>
To: SIMON WESTRIP <simonwestrip@btinternet.com>; Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Wednesday, June 15, 2016 1:45 AM
Subject: Dictionary conformance (was Re: Second proposal to allow looping of 'Set' categories)

 

Hi Simon,

I'll have a go at answering your questions.  This is an interesting line of questions, but a bit to the side of the other discussion, so I've put them into a different thread.

 

On 15 June 2016 at 09:16, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:

Dear John et al.

 

Struggling to keep-up again - so going back to basics:

 

1) I assume the scope of the proposed 'schema' definition is restricted to the dictionary in which its definition appears

(i.e. directs to alternative definitions that are already in that particular dictionary)?

 

The scope would be everything that COMCIFS managed. To anticipate your question below, we could/should have a very short 'IUCr core' dictionary with datanames that apply regardless of the particular domain, and which all DDLm dictionaries import.  Probably cif_core would import it, and all others would import cif_core.

 

SPW: that sounds logical

 

2) Is the intention that all CIFs (whatever domain) conform to a 'global' ddlm dictionary?

 

Excuse my pedantry here, but I want to make sure we understand each other. DDLm is a set of attributes which you can use to define the meaning of datanames.  A CIF (i.e. datafile) does not conform to a DDLm dictionary, it conforms to a dictionary. That dictionary is written in a DDL.  So we can write a dataname, and it has the same meaning (perhaps after application of aliases) regardless of the particular DDL in which that meaning is described.  If I have misunderstood you, and you simply meant, "is there a dictionary that all datafiles should draw on?", then the current answer is "no".  For example, mmCIF datanames completely replace datanames in cif_core.  The intention of proposal #2 is that _audit.schema would be a universal dataname, ideally defined in the separate 'IUCr core' dictionary described above.

 

SPW: "is there a dictionary that all datafiles should draw on?" - yes thats what I meant

 

3) How does one declare dictionary conformance in a CIF instance without using a dictionary-defined dataname?

 

You can't.  A fundamental assumption, that COMCIFS attempts to fulfill, is that the meaning of datanames does not change.  In particular, the meaning of the dictionary conformance datanames does not change, so the programmer can hard-code a dataname to output dictionary conformance and not worry that somehow in the future this dataname will have a different meaning.  Hopefully this underlines that the *primary* audience of our dictionaries is the human software programmer, *not* the software itself. The programmer (*not* the program)  is the one that has to read the text definition to work out e.g. what dataname contains the atomic positions. The "dictionary driven software" part can only relate to those things that software can understand and use and which *do not* relate to a change in meaning (because we're not supposed to change the meaning) - aliases and dREL come to mind.  All of the other machine-readable stuff can only be used for validation, which is why I assert that a lot of casual CIF-reading software doesn't bother with dictionaries at execution time - the information available at software creation time is guaranteed to be sufficient to 'get the job done' now and in future.

Basically I am certain that I am not alone in having to rely on heuristics based on prior knowledge just to identify that e.g. my molecular graphics program is dealing with a ddl2 CIF (mmCIF) rather than a ddl1 CIF, and whether its a pdCIF, msCIF, rhoCIF... is ddlm/CIF2 going to help at this rather fundamental level?

 

No, it will not.  What you need is for CIF authors to include the already-defined datanames specifying dictionary conformance.  The _audit.schema proposal will also not help. 

I suggest that each time (or once a day, to stay sane) that you find a file that does not contain the dataname _audit_conform_dict_name (old cif_core) or _audit_conform.dict_name (mmCIF, and in draft DDLm core dictionary), you send a gentle message to the software author (if you can identify them), asking them to include one of these datanames in the output template that they distribute with their software (or adjust their code).  You could even provide a line in the email for them to cut and paste.  Perhaps you can get checkCIF to issue a level C alert if they are missing (if it doesn't already) with a suggestion to include a particular line in the CIF file to make the alert go away.  Perhaps we can wait until DDLm cif_core is accepted before pushing this.

 

SPW: I think I'd better make sure my own CIF writing software starts doing a more robust job in this respect before asking others to! The most I do is look for the audit_conform items - I don't add them if not there.

 

On a side note, DDLm includes the dictionary import formalism, which means that there is only one 'master' dictionary that imports all the rest (so e.g. pdCIF would internally import 'cif_core'). This improves on the old dictionary merging formalism I discussed before, and is why audit_conform is a 'Set' category in DDLm but a multi-packet loop category in DDL2.  The 'Set' designation for _audit_conform may be worth revisiting.

 

 

 

 

 


 




--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

 

 



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer

 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

 


_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group




--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]