This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
Coping with the units extension on core dictionary data names.

Peter Keller (bsspak@bath.ac.uk)
Mon, 20 Nov 1995 15:07:45 +0000 (GMT)
Messages sorted by: [ date ][ subject ][ author ]
Previous message: Peter Keller: "Re: CIF/PDB reflection file format (crosspost from PDB mailing list)"
Hi all,

Last week, Peter (Murray-Rust)'s post to the PDB mailing list reminded me of
a problem with core cifdic compatability which still hasn't been dealt with
in mmCIF. This is the problem of how to cope with unit extensions to data 
names:

On Thu, 16 Nov 1995, Murray-Rust Dr P wrote:

> One of the issues I have been trrying to resolve in CIF is the 
> distinction between syntax and semantics.  In simple terms syntax means 
> parsing the file without attempting to interpret its meaning or content.  
> In general CIF (note that the core CIF dictionary does not use the full 
> STAR language) has a well-defined syntax and can be expressed by a BNF.  
> There are - I think - still a few concerns such as how to escape certain 
> characters and what to do when 'including' files.  
> 
> Semantics determines what meaning you put on the contents.  For example,
> 
> _cell_length_a_pm  1490.3
> 
> is meaningless untill semantics is applied to it.  CIF deals with this by 
> saying "Go and look up _cell_length_a_pm in a dictionary and take further 
> action depending on what you find".  The use of dictionaries in CIF is 
> IMO a major advance and one that the crystallographic community can be 
> proud of.
> 
> Unfortunately there are places where the syntax and semantics are 
> confused - in the above case the _pm suffix (which is not parsable) has 
> the implied message "Divide the following number by 100 because that is 
> what the crystallographic community uses"
> 
> In my view the syntax and semantics of CIF must be clearly separated, so 
> that the language can be rigorously parsed without having to add semantic 
> content (especially by implication).

There is nothing in mmDDL which corresponds to the core dictionary term
_units_extension, so at the moment, any attempt to use a data item such as
'_cell_length_a_pm' in a mmCIF-based application would fail. (The core ddl
does not seem to have kept pace with this either, but that is another
issue.) For the reasons which Peter explains very well, it would not be a
good idea to allow unit extensions to mmCIF data names in the same way as
they are in the core dictionary. Apart from this more fundemental problem,
it would cripple any application's attempt to speed up access to
dictionary terms by converting mmCIF to a form in which a fast lookup
algorithm can be used. As John pointed out at Montreal, the (increasing)
size of mmCIF makes this an important issue. 

Here is my proposal of how this could be resolved. It would need a new DDL
item in the ITEM_ALIASES category - _item_aliases.units_code, and the
relevant part of the DDL would change to: 

save__item_units_list.code

    _item_description.description
;
     The code specifying the name of the unit of measure.
;
     loop_
    _item.name
    _item.category_id                                
    _item.mandatory_code              
         '_item_units_list.code'              item_units_list         yes
         '_item_units.code'                   item_units              yes
         '_item_units_conversion.from_code'   item_units_conversion   yes
         '_item_units_conversion.to_code'     item_units_conversion   yes
### Units of _item_aliases.alias_name are implicitly the same as 
### those of _item_aliases.name:
         '_item_aliases.units_code'           item_aliases       implicit

    _item_type.code                              code
     loop_
    _item_linked.child_name                   
    _item_linked.parent_name                 
          '_item_units.code'                    '_item_units_list.code'
          '_item_units_conversion.from_code'    '_item_units_list.code'
          '_item_units_conversion.to_code'      '_item_units_list.code'
          '_item_aliases.units_code'            '_item_units_list.code'
     save_


As for an example of how this would be used in mmCIF, take the
_cell.length_a data item: 

save__cell.length_a
    _item_description.description
;              Unit-cell length a corresponding to the structure reported.
;
    _item.name                  '_cell.length_a'
    _item.category_id             cell
    _item.mandatory_code          no
    _item_sub_category.id        'cell_length'

### _item_aliases.name is determined implicitly for each row:
loop_
    _item_aliases.alias_name
    _item_aliases.units_code
    _item_aliases.dictionary
    _item_aliases.version
          '_cell_length_a'     'angstroms'   'cifdic.c94' '2.0'
          '_cell_length_a_pm'  'picometres'  'cifdic.c94' '2.0'
          '_cell_length_a_nm'  'nanometres'  'cifdic.c94' '2.0'

   .....
    _item_units.code             'angstroms'

   save_

Now, on finding 

   _cell_length_a_pm  1490.3

in a CIF, a mmCIF-based application can find out from the dictionary that the
units of the data value _in_the_file_ are picometres. It can also determine
that the units of _item_aliases.name ('_cell.length_a') are angstroms, and
can find the appropriate conversion from the ITEM_UNITS_CONVERSION category: 

     loop_
    _item_units_conversion.from_code
    _item_units_conversion.to_code
    _item_units_conversion.operator
    _item_units_conversion.factor
    ...
     'picometres'               'angstroms'                '*'   1.0E-02

I think that this answers the semantic vs. s syntactic point, since the
'_pm' suffix does not need to be 'noticed' at the time the data name is
parsed - '_cell_length_a_pm' is just another data name at this point. The
assignment of units to the following data value is cleanly separated from
the syntax checking. 

If it seems a little odd at first to put something about units in the 
ITEM_ALIASES category, bear in mind that this category's raison d'etre is 
compatability with the core CIF dictionary anyway.

Comments anyone?

Peter.

========================================================================
Peter Keller.            \ "We kill the cows to make jackets out of 
Dept. of Biology and      \  them, and then we kill each other for the
    Biochemistry,          \  jackets we made out of the cows."
University of Bath,         \                   --- Denis Leary
Bath, BA2 7AY, UK.           \ 
------------------------------\-----------------------------------------
Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet)
Fax. (+44/0)1225 826449        |   P.A.Keller%bath.ac.uk@UKACRL (BITNET)
========================================================================
Previous message: Peter Keller: "Re: CIF/PDB reflection file format (crosspost from PDB mailing list)"