[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Please advise regarding a design of CIF dictionaries for materialproperties
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
- Subject: Re: Please advise regarding a design of CIF dictionaries for materialproperties
- From: Saulius Grazulis <grazulis@ibt.lt>
- Date: Thu, 29 Sep 2011 12:24:29 +0300
- In-Reply-To: <4E834642.1020203@mcmaster.ca>
- Organization: IBT
- References: <4E832F86.6080805@ibt.lt> <4E834642.1020203@mcmaster.ca>
Dear Herbert, John, David, many thanks for the thorough explanations and suggestions! >From what I understand the "cleanest" way to describe the information we have, as long as we want to stay within DDL1, would be to have a separate data block for each _prop_tensorXYZ_temperature, _prop_tensorXYZ_pressure tag and so forth, like this: data_prop_elastic_stiffness_temperature _name '_prop_elastic_stiffness_temperature' # ... and so on for the _prop_elastic_stiffness_temperature data_prop_piezoelectric_temperature _name '_prop_piezoelectric_temperature' # ... and so on for the _prop_piezoelectric_temperature I could then either assign all such tags to category _prop, or to different categories _prop_elastic_stiffness, _prop_piezoelectric and so on. Actually, I realized the answer to one of my questions -- yes, a category can be "split" over several loops, _atom_site does it introducing separate loops for coordinates and for anisotropic ADPs. Now this approach is very bad for the reason that it violates SPOT (Single Point of Truth) -- the same information will be repeated in multiple datablocks (50 or so times for just one tag! And ther ewil be several tags like this...). Apart from the waste of human time when all need to be updated, this also begs for mistakes -- fix 48 entries and forget two. Too bad. Three approaches seem feasible: a) Use save frames in DDL1 (a heresy?) and save frame references. A save_sframe may be "referenced as $sframe", although in this context it is not clear what does it mean "referenced". To me it sounds like a simple macro expansion would be appropriate here (and woud actually solve my problem neatly ;). Unfortunately, CIF syntax (actually, semantics) description (http://www.iucr.org/resources/cif/spec/version1.1/cifsyntax) does not mandate the use of save frame references at the moment. Would it be beneficial to extend the CIF grammar and interpretation to finally to permit save frame references? b) Use an external macro-generator to generate separate data blocks. Doable but clumsy. c) I could group the temperature tags into one datablock: data_prop_temperature loop_ _name '_prop_elastic_stiffness_temperature' '_prop_piezoelectric_temperature' # Other names will follow and may be added in the future releases # of the dictionary # .. other characteristics of all tags follow here ... This would force the tags to be in the _prop_ category, but as long as we can put them into separate loops this is fine. The only slight concern is that the data block name is no longer a tag name. c) seems the cleanest solution form, but this depends on my interpretations of the CIF grammar, semantics and of the ITC vol G. See detailed discussion below. On 09/28/2011 07:05 PM, Bollinger, John C wrote: > It also specifies (ITG 2.5.5) that item names be used as definition > datablock names. Thanks, John, for the good reference. Still, it leaves some of my questions open... "Each item description is preceded by the item’s unique identifying tag as a datablock code" (ITG(2005) p.55). This tells me just that the data block header must be unique in the file. It does not seem to require any particular connection to a data name describe within. Actually, it seems to be a bad practice to assign any semantics to data_... block names -- when CIF files are processed, especially concatenated, there may be a need to rename data blocks to assure uniqueness, and thus their information content is necessary scrambled. We once tried to put datablock names as the sole indication of COD number for our structures, and were (justly) criticized by our users; having introduced _database_code_COD we now feel much safer about data traceability. Wouldn't the same logic and "best practices" apply to dictionaries? Thats is, software should not rely on data block names to find any data items; it should rely on tag values instead. A practical question: does any CIF validation software exist that relies on the connection of data block names and declared tag names in a dictionary? Would you write such software? "The name of a data block matches the defined data name or the initial portion thereof." (ITG(2005) p.56) Hmmm... when I think closer, this is a very vague message. Does it simply state the (obvious) fact that in the example data name matches that prefix of the declared name? Is it just recommendation for human readability (nice to have, but not must have)? Does it imply that that the the data block name MUST, or MAY, or SHOULD match the declared name (as in http://www.ietf.org/rfc/rfc2119.txt)? If the interpretation is SHOULD, I would go for data_prop_temperature datablock not matching the prefix of the dataname, since IMHO the preservation of SPOT grossly outweighs a slightly unusual case for human readers. Software MUST NOT rely on such matching, anyway. If the interpretation is MUST, then I am in trouble. David Brown wrote: > Although temperature is a single property, the temperatures at which > different properties are measured are in principle different > temperatures. They will appear in different loops, but they are not > the same property. The description should be more specific in the > example above, e.g., > > _description ; Specifies temperature in Kelvin at which the > peozoelectric tensor was measured. ; My worry here is that although, as you are correctly state, different properties can be measured at different temperatures, the tag describing each of the measurement temperatures SHOULD have exactly the same properties (type, esd, reange, etc), and this should somehow be enforced or at least reflected by the dictionary. On 09/28/2011 07:05 PM, Bollinger, John C wrote: > The specifications say that each item defined in a DDL1 dictionary > must be in its own data block, except that the members of an > "irreducible set" of items may be defined in the same block. The > items you describe do not seem an irreducible set to me (unlike, for > example, all the elements of a single tensor) ITG is very vague about what is an "irreducible data items". It says just "Some data items are closely related to other data items". I would argue that the temperature values are indeed extremely closely related, even though the relation is somewhat different than that of cell constants, the example provided in the Tables. The main question is: can there be an implementation of CIF software, existing or conceivable in the future, that would be deemed to be a "correct" implementation of CIF validator and still fail to read or validate against the dictionary designed as proposed above? E.e. a dictionary having a data block like this: data_prop_temperature loop_ _name '_prop_elastic_stiffness_temperature' '_prop_piezoelectric_temperature' # Other names will follow and may be added in the future releases # of the dictionary # .. other characteristics of all tags follow here ... On 09/28/2011 07:07 PM, David Brown wrote: >> b) one dictionary data block describes names that are potentially >> in different categories (but otherwise have the same >> characteristics)? For example, would the dictionary entry above be >> considered correct if we declare >> _prop_elastic_stiffness_temperature to be in >> 'prop_elastic_stiffness' category, and >> _prop_piezoelectric_temperature to be in 'prop_piezoelectric' >> category, and still have one dictionary datablock to specify their >> properties? > > This should be avoided. It has been used in DDL1, but is not > allowed in DDLm. Yes, it makes the dictionary larger, but it keeps > everything in the right place. In DDLm the duplication is minimized > by the ability to insert the same common description of temperature > into many different definitions in the dictionary. OK, I'll follow your advice and will not use this. On 09/28/2011 07:07 PM, David Brown wrote: >> c) data_... block name in the dictionary no longer matches tag >> name. I guess this should not be a problem... Is it? > > It is not a problem in DDLm, I am not sure about DDL1, but it could > be confusing. Best avoided. Your comments sounds like the ITG is to be interpreted "The name of a data block SHOULD match the defined data name or the initial portion thereof". I.e. in special cases deviations from the rule are permissible, and software should not rely on it. Is that true? >> d) would it break anything if category name is not the prefix of >> the tag (e.g. declaring _prop_piezoelectric_temperature to have >> category _prop_temperature, to describe all temperature tags in >> one data block)? > > In DDL1 this sometimes happens. In DDLm the name is constructed out > of he category and the item name which might make transformation to > DDLm problematic. Best avoided. I could avoid situation described in the above d) by formally putting all tags into the "prop" category. On 09/28/2011 07:05 PM, Bollinger, John C wrote: > If supporting validation is a goal, however, ... I would say "of course". For me, the main advantage of having a CIF dictionary is to be able to do automated validation. > If contracting item definitions is also an important goal then I > would advise you to take Herbert's advice and use DDL2. Contracting definitions, I would say, is a must. However, I would be very reluctant to go to DDL2. MPOD operates in the small-molecule context, which exclusively uses DDL1 CIFs for descriptions, and DDL1 dictionaries for item definitions. DDL2 introduces a lot of extra complexity which I would not like to support at the moment for just contracting the definitions. Regards, Saulius -- Dr. Saulius Gražulis Institute of Biotechnology, Graiciuno 8 LT-02241 Vilnius, Lietuva (Lithuania) fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556 mobile: (+370-684)-49802, (+370-614)-36366
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Prev by Date: RE: Please advise regarding a design of CIF dictionaries formaterialpr operties. .
- Next by Date: RE: Please advise regarding a design of CIF dictionaries for materialproperties. .
- Prev by thread: Re: Please advise regarding a design of CIF dictionaries for materialproperties
- Next by thread: RE: Please advise regarding a design of CIF dictionaries for materialproperties. .
- Index(es):