[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please advise regarding a design of CIF dictionaries for materialproperties

Dear Herbert, John, David,
many thanks for the thorough explanations and suggestions!
From what I understand the "cleanest" way to describe the information wehave, as long as we want to stay within DDL1, would be to have aseparate data block for each _prop_tensorXYZ_temperature,_prop_tensorXYZ_pressure tag and so forth, like this:
data_prop_elastic_stiffness_temperature_name '_prop_elastic_stiffness_temperature'# ... and so on for the _prop_elastic_stiffness_temperature
data_prop_piezoelectric_temperature_name '_prop_piezoelectric_temperature'# ... and so on for the _prop_piezoelectric_temperature
I could then either assign all such tags to category _prop, or todifferent categories _prop_elastic_stiffness, _prop_piezoelectric and so on.
Actually, I realized the answer to one of my questions -- yes, acategory can be "split" over several loops, _atom_site does itintroducing separate loops for coordinates and for anisotropic ADPs.
Now this approach is very bad for the reason that it violates SPOT(Single Point of Truth) -- the same information will be repeated inmultiple datablocks (50 or so times for just one tag! And ther ewil beseveral tags like this...). Apart from the waste of humantime when all need to be updated, this also begs for mistakes -- fix 48entries and forget two. Too bad.
Three approaches seem feasible:
a) Use save frames in DDL1 (a heresy?) and save frame references. Asave_sframe may be "referenced as $sframe", although in thiscontext it is not clear what does it mean "referenced". To me it soundslike a simple macro expansion would be appropriate here (and woudactually solve my problem neatly ;). Unfortunately, CIF syntax(actually, semantics) description(http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax) does notmandate the use of save frame references at the moment. Would it bebeneficial to extend the CIF grammar and interpretation to finally topermit save frame references?
b) Use an external macro-generator to generate separate data blocks.Doable but clumsy.
c) I could group the temperature tags into one datablock:
data_prop_temperatureloop__name '_prop_elastic_stiffness_temperature'      '_prop_piezoelectric_temperature'      # Other names will follow and may be added in the future releases      # of the dictionary# .. other characteristics of all tags follow here ...
This would force the tags to be in the _prop_ category, but as long aswe can put them into separate loops this is fine. The only slightconcern is that the data block name is no longer a tag name.
c) seems the cleanest solution form, but this depends on myinterpretations of the CIF grammar, semantics and of the ITC vol G.
See detailed discussion below.
On 09/28/2011 07:05 PM, Bollinger, John C wrote:
> It also specifies (ITG 2.5.5) that item names be used as definition > datablock names.
Thanks, John, for the good reference.
Still, it leaves some of my questions open...
"Each item description is preceded by the item’s unique identifying tagas a datablock code" (ITG(2005) p.55). This tells me just that the datablock header must be unique in the file. It does not seem to require anyparticular connection to a data name describe within.
Actually, it seems to be a bad practice to assign any semantics todata_... block names -- when CIF files are processed, especiallyconcatenated, there may be a need to rename data blocks to assureuniqueness, and thus their information content is necessary scrambled.We once tried to put datablock names as the sole indication of CODnumber for our structures, and were (justly) criticized by our users;having introduced _database_code_COD we now feel much safer about datatraceability.
Wouldn't the same logic and "best practices" apply to dictionaries?Thats is, software should not rely on data block names to find any dataitems; it should rely on tag values instead.
A practical question: does any CIF validation software exist thatrelies on the connection of data block names and declared tag names in adictionary? Would you write such software?
"The name of a data block matches the defined data name or the initialportion thereof." (ITG(2005) p.56)
Hmmm... when I think closer, this is a very vague message. Does itsimply state the (obvious) fact that in the example data name matchesthat prefix of the declared name? Is it just recommendation for humanreadability (nice to have, but not must have)? Does it imply that thatthe the data block name MUST, or MAY, or SHOULD match the declared name(as in http://www.ietf.org/rfc/rfc2119.txt)?
If the interpretation is SHOULD, I would go for data_prop_temperaturedatablock not matching the prefix of the dataname, since IMHO thepreservation of SPOT grossly outweighs a slightly unusual case for humanreaders. Software MUST NOT rely on such matching, anyway.
If the interpretation is MUST, then I am in trouble.
David Brown wrote:
> Although temperature is a single property, the temperatures at which>  different properties are measured are in principle different > temperatures.  They will appear in different loops, but they are not > the same property.  The description should be more specific in the > example above, e.g.,> > _description ; Specifies temperature in Kelvin at which the > peozoelectric tensor was measured. ;
My worry here is that although, as you are correctly state, differentproperties can be measured at different temperatures, the tag describingeach of the measurement temperatures SHOULD have exactly the sameproperties (type, esd, reange, etc), and this should somehow be enforcedor at least reflected by the dictionary.
On 09/28/2011 07:05 PM, Bollinger, John C wrote:
> The specifications say that each item defined in a DDL1 dictionary > must be in its own data block, except that the members of an > "irreducible set" of items may be defined in the same block.  The > items you describe do not seem an irreducible set to me (unlike, for > example, all the elements of a single tensor)
ITG is very vague about what is an "irreducible data items". It saysjust "Some data items are closely related to other data items". I wouldargue that the temperature values are indeed extremely closely related,even though the relation is somewhat different than that of cellconstants, the example provided in the Tables.
The main question is: can there be an implementation of CIF software,existing or conceivable in the future, that would be deemed tobe a "correct" implementation of CIF validator and still fail to read orvalidate against the dictionary designed as proposed above? E.e. adictionary having a data block like this:
data_prop_temperatureloop__name '_prop_elastic_stiffness_temperature'      '_prop_piezoelectric_temperature'      # Other names will follow and may be added in the future releases      # of the dictionary# .. other characteristics of all tags follow here ...
On 09/28/2011 07:07 PM, David Brown wrote:
>> b) one dictionary data block describes names that are potentially >> in different categories (but otherwise have the same >> characteristics)? For example, would the dictionary entry above be >> considered correct if we declare >> _prop_elastic_stiffness_temperature to be in >> 'prop_elastic_stiffness' category, and >> _prop_piezoelectric_temperature to be in 'prop_piezoelectric' >> category, and still have one dictionary datablock to specify their >> properties?> > This should be avoided.  It has been used in DDL1, but is not> allowed in DDLm.  Yes, it makes the dictionary larger, but it keeps > everything in the right place.  In DDLm the duplication is minimized > by the ability to insert the same common description of temperature > into many different definitions in the dictionary.
OK, I'll follow your advice and will not use this.
On 09/28/2011 07:07 PM, David Brown wrote:
>> c) data_... block name in the dictionary no longer matches tag >> name. I guess this should not be a problem... Is it?> > It is not a problem in DDLm, I am not sure about DDL1, but it could > be confusing.  Best avoided.
Your comments sounds like the ITG is to be interpreted "The name of adata block SHOULD match the defined data name or the initial portionthereof". I.e. in special cases deviations from the rule arepermissible, and software should not rely on it. Is that true?
>> d) would it break anything if category name is not the prefix of >> the tag (e.g. declaring _prop_piezoelectric_temperature to have >> category _prop_temperature, to describe all temperature tags in>> one data block)?> > In DDL1 this sometimes happens.  In DDLm the name is constructed out > of he category and the item name which might make transformation to > DDLm problematic.  Best avoided.
I could avoid situation described in the above d) by formally puttingall tags into the "prop" category.
On 09/28/2011 07:05 PM, Bollinger, John C wrote:
> If supporting validation is a goal, however, ...
I would say "of course". For me, the main advantage of having a CIFdictionary is to be able to do automated validation.
> If contracting item definitions is also an important goal then I > would advise you to take Herbert's advice and use DDL2.
Contracting definitions, I would say, is a must. However, I would bevery reluctant to go to DDL2. MPOD operates in the small-moleculecontext, which exclusively uses DDL1 CIFs for descriptions, and DDL1dictionaries for item definitions. DDL2 introduces a lot of extracomplexity which I would not like to support at the moment for justcontracting the definitions.
Regards,Saulius
-- Dr. Saulius GražulisInstitute of Biotechnology, Graiciuno 8LT-02241 Vilnius, Lietuva (Lithuania)fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556mobile: (+370-684)-49802, (+370-614)-36366
_______________________________________________comcifs mailing listcomcifs@iucr.orghttp://scripts.iucr.org/mailman/listinfo/comcifs

Reply to: [list | sender only]