This is an archive copy of the IUCr web site dating from 2008. For current content please visit https://www.iucr.org.
[IUCr Home Page] [CIF Home Page] [mmCIF Home Page]

Re: mmCIF: A few little things, and one big problem.

Paula Fitzgerald (paula_fitzgerald@Merck.Com)
Thu, 10 Aug 95 16:58:51 EDT


Peter Keller wrote:

> Hi Paula,
> 
> I thought that I had just about finished with playing around with the 
> dictionary, but in tightening up my cifdic->symbol table code, I have 
> opened up a whole new can of worms.
> 
> Three small points:
> 
>    line 1795: '_struct_sheet_gen.label_seq_id' should be 
> '_struct_site_gen.label_seq_id'
> 
>    lines 13200 and 13231: item _phasing_MIR.entry_id is repeated on the 
> following lines.
> 
>    lines 21374 and 21391: '_refln_A_meas' is defined as an alias for both 
> _refln.A_meas and _refln.A_meas_au. By analogy with _refln.B_meas*, the 
> latter alias should be deleted.

There are all fixed.

> Now for the big point (and this is related to my recent mailing to mmddl 
> news, and John's reply). I agree very strongly with the convention you 
> have adopted, of providing a save frame definition for every single item. 
> I see this as equivalent to compulsory declarations of identifiers in a 
> programming language such as C, and without it, it would be impossible to 
> find spelling errors such as the one above. BUT, this does create the 
> possibility of defining characteristics of a child data item in two 
> places, and this has happened for about 260 data items. Take 
> _phasing_mad_clust.expt_id as an example. Its declaration is:
> 
> save__phasing_mad_clust.expt_id
>     _item_description.description
> ;              This data item is a pointer to _phasing_mad_expt.id in the
>                PHASING_MAD_EXPT category.
> ;
>     _item.name                  '_phasing_mad_clust.expt_id'
>     _item.category_id             phasing_mad_clust
>     _item.mandatory_code          yes
>     _item_type.code               char
>      save_
> 
> However, _item.category_id and _item.mandatory_code are also defined for
> this item in the save frame of the parent item: 
> 
> save__phasing_mad_expt.id
>      .....
>      loop_
>     _item.name
>     _item.category_id
>     _item.mandatory_code
>                '_phasing_mad_expt.id'        phasing_mad_expt   yes
>                '_phasing_mad_clust.expt_id'  phasing_mad_clust  yes
>                ......
>      ......
>     _item_type.code               char
>      save_
> 
> As far as I can see, the multiply-defined items don't conflict, but there
> is nothing to prevent them from doing so, and in the long run, it will
> make the dictionary harder to maintain. As I understand the DDL (and
> John's reply to my comments about _item.mandatory_code on mmddlnews), all
> the characteristics of a child item should be defined in the save frame of
> the parent item, and the placeholder save frame should only contain
> _item_description.description and _item.name. Put another way, it should 
> be possible to remove the save frame declaration of the child from the 
> dictionary entirely, without losing any information.
> 
> [He wrote: It was agreed that in order to provide an easier integration 
> with older dictionaries that there be a placeholder definition for every 
> item in the mmCIF dictionary.  This really results in a large number of 
> essentially redundant definitions for data items that are children of 
> other items. In these cases only the definition of the data item and 
> perhaps the item name have been specified in the mmcif dictionary.]
> 
> To conform to this view, the two save frames above would need to be 
> changed to:
> 
> save__phasing_mad_clust.expt_id
>     _item_description.description
> ;              This data item is a pointer to _phasing_mad_expt.id in the
>                PHASING_MAD_EXPT category.
> ;
>     _item.name                  '_phasing_mad_clust.expt_id'
>      save_
> 
> 
> and
> 
> save__phasing_mad_expt.id
>      .....
>      loop_
>     _item.name
>     _item.category_id
>     _item.mandatory_code
>                '_phasing_mad_expt.id'        phasing_mad_expt   yes
>                '_phasing_mad_clust.expt_id'  phasing_mad_clust  yes
>                ......
>      ......
>     loop_
>     _item_type.name 
>     _item_type.code
>                '_phasing_mad_expt.id'               char
>                '_phasing_mad__clust.expt_id'        char
>                ......
>      save_
> 
> 
> Note that I have moved _item_type.code out of the child's save frame, and
> into the parent's. Even aliases should be taken out of the placeholder
> save frames, and put in a 
> loop_    _item_aliases.name    _item_aliases.alias_name    construct. 
> 
> I don't know what tools you are using, but if the thought of doing all
> this is too much for you, I'd be quite happy to help (I could adapt my own
> code to do a lot of it quite easily). As a priority, the multiply defined
> items should be removed (virtually all of them are for
> _item.mandatory_code), and then I could think about moving the others. 
> Using the data model which I have put together, I don't believe that it
> would be hard for me to do. 
> 
> Please let me know what you think.
> Regards,
> Peter.
> 
> 
> ========================================================================
> Peter Keller.            \ 
> Dept. of Biology and      \ "Not even the greatest nonsense is beyond
>     Biochemistry,          \  the reach of human invention."
> University of Bath,         \ 
> Bath, BA2 7AY, UK.           \          --- Ryszard Kapuscinski
> ------------------------------\-----------------------------------------
> Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet)
> Fax. (+44/0)1225 826449        |   P.A.Keller%bath.ac.uk@UKACRL (BITNET)
> ========================================================================
 
I'm not going to go into a discussion of why we decided to carry _item.name
and _item.mandatory_code in the stand-alone definitions for each of the data
items that also definted as a child in a parent tree.  In fact, although I
can remember the discussion about adding _item.name, I can't recall why it
was necessary to add _item.mandatory_code.

But we did, and the only thing I care about (at least on this list) at this
point is that we do what we do consistently.  The example that Peter points
out (_chem_link.type_comp_1) strikes me as just being a violation of consist-
ency and not a fundamental theoretical issue.  I have fixed that problem, and
a number of others just like it (most of which I added inadventently in the
headlong rush to get things pulled together for Montreal), and have declared
yet another version.

[I don't want stiffle creativity on the more basic issues that Peter has
raised, but I suggest that a continuing discussion along those lines is
more appropriate for the DDL list than for this one].

The audit trail for the new changes:

  0.7.23 1995-08-10
;
  Changes (PMDF):
    + Changed _struct_sheet_gen.label_seq_id to _struct_site_gen.label_seq_id
        in _atom_site.label_seq_id tree
    + Removed duplicate entry of _phasing_MIR.entry_id in _entry.id tree
    + Removed alias in definitionof _refln.A_meas_au
    + Removed _item.category_id from
        _chem_link.type_comp_1
        _chem_link.type_comp_2
        _phasing_mad_clust.expt_id
        _phasing_mad_set.clust_id
        _phasing_mad_set.expt_id
        _phasing_mad_set.set_id
        _phasing_mad_ratio.expt_id
        _phasing_mad_ratio.clust_id
        _phasing_mad_ratio.wavelength_1
        _phasing_mad_ratio.wavelength_2
    + Removed _item_type.code from most of the above (it wasn't there in all
        of them).
    + Added _item.mandatory_code to _phasing_mir_der.der_set_id
    + Corrected _item.name for _phasing_mad_ratio.wavelength_2
;

Bye for now - Paula

********************************************************************************
 Dr. Paula M. D. Fitzgerald  ______________ voice and FAX: (908) 594-5510
   Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com
     P.O. Box 2000, Ry50-105     ______________ or bean@merck.com           
       Rahway, NJ 07065  USA 
         (for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000)  
********************************************************************************