Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Please advise regarding a design of CIF dictionaries for materialproperties

Dear Herbert, John, David,

many thanks for the thorough explanations and suggestions!

>From what I understand the "cleanest" way to describe the information we
have, as long as we want to stay within DDL1, would be to have a
separate data block for each _prop_tensorXYZ_temperature,
_prop_tensorXYZ_pressure tag and so forth, like this:

data_prop_elastic_stiffness_temperature
_name '_prop_elastic_stiffness_temperature'
# ... and so on for the _prop_elastic_stiffness_temperature

data_prop_piezoelectric_temperature
_name '_prop_piezoelectric_temperature'
# ... and so on for the _prop_piezoelectric_temperature

I could then either assign all such tags to category _prop, or to
different categories _prop_elastic_stiffness, _prop_piezoelectric and so on.

Actually, I realized the answer to one of my questions -- yes, a
category can be "split" over several loops, _atom_site does it
introducing separate loops for coordinates and for anisotropic ADPs.

Now this approach is very bad for the reason that it violates SPOT
(Single Point of Truth) -- the same information will be repeated in
multiple datablocks (50 or so times for just one tag! And ther ewil be
several tags like this...). Apart from the waste of human
time when all need to be updated, this also begs for mistakes -- fix 48
entries and forget two. Too bad.

Three approaches seem feasible:

a) Use save frames in DDL1 (a heresy?) and save frame references. A
save_sframe may be "referenced as $sframe", although in this
context it is not clear what does it mean "referenced". To me it sounds
like a simple macro expansion would be appropriate here (and woud
actually solve my problem neatly ;). Unfortunately, CIF syntax
(actually, semantics) description
(http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax) does not
mandate the use of save frame references at the moment. Would it be
beneficial to extend the CIF grammar and interpretation to finally to
permit save frame references?

b) Use an external macro-generator to generate separate data blocks.
Doable but clumsy.

c) I could group the temperature tags into one datablock:

data_prop_temperature
loop_
_name '_prop_elastic_stiffness_temperature'
      '_prop_piezoelectric_temperature'
      # Other names will follow and may be added in the future releases
      # of the dictionary
# .. other characteristics of all tags follow here ...

This would force the tags to be in the _prop_ category, but as long as
we can put them into separate loops this is fine. The only slight
concern is that the data block name is no longer a tag name.

c) seems the cleanest solution form, but this depends on my
interpretations of the CIF grammar, semantics and of the ITC vol G.

See detailed discussion below.

On 09/28/2011 07:05 PM, Bollinger, John C wrote:

> It also specifies (ITG 2.5.5) that item names be used as definition 
> datablock names.

Thanks, John, for the good reference.

Still, it leaves some of my questions open...

"Each item description is preceded by the item’s unique identifying tag
as a datablock code" (ITG(2005) p.55). This tells me just that the data
block header must be unique in the file. It does not seem to require any
particular connection to a data name describe within.

Actually, it seems to be a bad practice to assign any semantics to
data_... block names -- when CIF files are processed, especially
concatenated, there may be a need to rename data blocks to assure
uniqueness, and thus their information content is necessary scrambled.
We once tried to put datablock names as the sole indication of COD
number for our structures, and were (justly) criticized by our users;
having introduced _database_code_COD we now feel much safer about data
traceability.

Wouldn't the same logic and "best practices" apply to dictionaries?
Thats is, software should not rely on data block names to find any data
items; it should rely on tag values instead.

A practical question: does any CIF validation software exist that
relies on the connection of data block names and declared tag names in a
dictionary? Would you write such software?

"The name of a data block matches the defined data name or the initial
portion thereof." (ITG(2005) p.56)

Hmmm... when I think closer, this is a very vague message. Does it
simply state the (obvious) fact that in the example data name matches
that prefix of the declared name? Is it just recommendation for human
readability (nice to have, but not must have)? Does it imply that that
the the data block name MUST, or MAY, or SHOULD match the declared name
(as in http://www.ietf.org/rfc/rfc2119.txt)?

If the interpretation is SHOULD, I would go for data_prop_temperature
datablock not matching the prefix of the dataname, since IMHO the
preservation of SPOT grossly outweighs a slightly unusual case for human
readers. Software MUST NOT rely on such matching, anyway.

If the interpretation is MUST, then I am in trouble.

David Brown wrote:

> Although temperature is a single property, the temperatures at which
>  different properties are measured are in principle different 
> temperatures.  They will appear in different loops, but they are not 
> the same property.  The description should be more specific in the 
> example above, e.g.,
> 
> _description ; Specifies temperature in Kelvin at which the 
> peozoelectric tensor was measured. ;

My worry here is that although, as you are correctly state, different
properties can be measured at different temperatures, the tag describing
each of the measurement temperatures SHOULD have exactly the same
properties (type, esd, reange, etc), and this should somehow be enforced
or at least reflected by the dictionary.

On 09/28/2011 07:05 PM, Bollinger, John C wrote:

> The specifications say that each item defined in a DDL1 dictionary 
> must be in its own data block, except that the members of an 
> "irreducible set" of items may be defined in the same block.  The 
> items you describe do not seem an irreducible set to me (unlike, for 
> example, all the elements of a single tensor)

ITG is very vague about what is an "irreducible data items". It says
just "Some data items are closely related to other data items". I would
argue that the temperature values are indeed extremely closely related,
even though the relation is somewhat different than that of cell
constants, the example provided in the Tables.

The main question is: can there be an implementation of CIF software,
existing or conceivable in the future, that would be deemed to
be a "correct" implementation of CIF validator and still fail to read or
validate against the dictionary designed as proposed above? E.e. a
dictionary having a data block like this:

data_prop_temperature
loop_
_name '_prop_elastic_stiffness_temperature'
      '_prop_piezoelectric_temperature'
      # Other names will follow and may be added in the future releases
      # of the dictionary
# .. other characteristics of all tags follow here ...

On 09/28/2011 07:07 PM, David Brown wrote:

>> b) one dictionary data block describes names that are potentially 
>> in different categories (but otherwise have the same 
>> characteristics)? For example, would the dictionary entry above be 
>> considered correct if we declare 
>> _prop_elastic_stiffness_temperature to be in 
>> 'prop_elastic_stiffness' category, and 
>> _prop_piezoelectric_temperature to be in 'prop_piezoelectric' 
>> category, and still have one dictionary datablock to specify their 
>> properties?
> 
> This should be avoided.  It has been used in DDL1, but is not
> allowed in DDLm.  Yes, it makes the dictionary larger, but it keeps 
> everything in the right place.  In DDLm the duplication is minimized 
> by the ability to insert the same common description of temperature 
> into many different definitions in the dictionary.

OK, I'll follow your advice and will not use this.

On 09/28/2011 07:07 PM, David Brown wrote:

>> c) data_... block name in the dictionary no longer matches tag 
>> name. I guess this should not be a problem... Is it?
> 
> It is not a problem in DDLm, I am not sure about DDL1, but it could 
> be confusing.  Best avoided.

Your comments sounds like the ITG is to be interpreted "The name of a
data block SHOULD match the defined data name or the initial portion
thereof". I.e. in special cases deviations from the rule are
permissible, and software should not rely on it. Is that true?

>> d) would it break anything if category name is not the prefix of 
>> the tag (e.g. declaring _prop_piezoelectric_temperature to have 
>> category _prop_temperature, to describe all temperature tags in
>> one data block)?
> 
> In DDL1 this sometimes happens.  In DDLm the name is constructed out 
> of he category and the item name which might make transformation to 
> DDLm problematic.  Best avoided.

I could avoid situation described in the above d) by formally putting
all tags into the "prop" category.

On 09/28/2011 07:05 PM, Bollinger, John C wrote:

> If supporting validation is a goal, however, ...

I would say "of course". For me, the main advantage of having a CIF
dictionary is to be able to do automated validation.

> If contracting item definitions is also an important goal then I 
> would advise you to take Herbert's advice and use DDL2.

Contracting definitions, I would say, is a must. However, I would be
very reluctant to go to DDL2. MPOD operates in the small-molecule
context, which exclusively uses DDL1 CIFs for descriptions, and DDL1
dictionaries for item definitions. DDL2 introduces a lot of extra
complexity which I would not like to support at the moment for just
contracting the definitions.

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366



Reply to: [list | sender only]