Peter Keller wrote: > Hi Paula, > > I thought that I had just about finished with playing around with the > dictionary, but in tightening up my cifdic->symbol table code, I have > opened up a whole new can of worms. > > Three small points: > > line 1795: '_struct_sheet_gen.label_seq_id' should be > '_struct_site_gen.label_seq_id' > > lines 13200 and 13231: item _phasing_MIR.entry_id is repeated on the > following lines. > > lines 21374 and 21391: '_refln_A_meas' is defined as an alias for both > _refln.A_meas and _refln.A_meas_au. By analogy with _refln.B_meas*, the > latter alias should be deleted. There are all fixed. > Now for the big point (and this is related to my recent mailing to mmddl > news, and John's reply). I agree very strongly with the convention you > have adopted, of providing a save frame definition for every single item. > I see this as equivalent to compulsory declarations of identifiers in a > programming language such as C, and without it, it would be impossible to > find spelling errors such as the one above. BUT, this does create the > possibility of defining characteristics of a child data item in two > places, and this has happened for about 260 data items. Take > _phasing_mad_clust.expt_id as an example. Its declaration is: > > save__phasing_mad_clust.expt_id > _item_description.description > ; This data item is a pointer to _phasing_mad_expt.id in the > PHASING_MAD_EXPT category. > ; > _item.name '_phasing_mad_clust.expt_id' > _item.category_id phasing_mad_clust > _item.mandatory_code yes > _item_type.code char > save_ > > However, _item.category_id and _item.mandatory_code are also defined for > this item in the save frame of the parent item: > > save__phasing_mad_expt.id > ..... > loop_ > _item.name > _item.category_id > _item.mandatory_code > '_phasing_mad_expt.id' phasing_mad_expt yes > '_phasing_mad_clust.expt_id' phasing_mad_clust yes > ...... > ...... > _item_type.code char > save_ > > As far as I can see, the multiply-defined items don't conflict, but there > is nothing to prevent them from doing so, and in the long run, it will > make the dictionary harder to maintain. As I understand the DDL (and > John's reply to my comments about _item.mandatory_code on mmddlnews), all > the characteristics of a child item should be defined in the save frame of > the parent item, and the placeholder save frame should only contain > _item_description.description and _item.name. Put another way, it should > be possible to remove the save frame declaration of the child from the > dictionary entirely, without losing any information. > > [He wrote: It was agreed that in order to provide an easier integration > with older dictionaries that there be a placeholder definition for every > item in the mmCIF dictionary. This really results in a large number of > essentially redundant definitions for data items that are children of > other items. In these cases only the definition of the data item and > perhaps the item name have been specified in the mmcif dictionary.] > > To conform to this view, the two save frames above would need to be > changed to: > > save__phasing_mad_clust.expt_id > _item_description.description > ; This data item is a pointer to _phasing_mad_expt.id in the > PHASING_MAD_EXPT category. > ; > _item.name '_phasing_mad_clust.expt_id' > save_ > > > and > > save__phasing_mad_expt.id > ..... > loop_ > _item.name > _item.category_id > _item.mandatory_code > '_phasing_mad_expt.id' phasing_mad_expt yes > '_phasing_mad_clust.expt_id' phasing_mad_clust yes > ...... > ...... > loop_ > _item_type.name > _item_type.code > '_phasing_mad_expt.id' char > '_phasing_mad__clust.expt_id' char > ...... > save_ > > > Note that I have moved _item_type.code out of the child's save frame, and > into the parent's. Even aliases should be taken out of the placeholder > save frames, and put in a > loop_ _item_aliases.name _item_aliases.alias_name construct. > > I don't know what tools you are using, but if the thought of doing all > this is too much for you, I'd be quite happy to help (I could adapt my own > code to do a lot of it quite easily). As a priority, the multiply defined > items should be removed (virtually all of them are for > _item.mandatory_code), and then I could think about moving the others. > Using the data model which I have put together, I don't believe that it > would be hard for me to do. > > Please let me know what you think. > Regards, > Peter. > > > ======================================================================== > Peter Keller. \ > Dept. of Biology and \ "Not even the greatest nonsense is beyond > Biochemistry, \ the reach of human invention." > University of Bath, \ > Bath, BA2 7AY, UK. \ --- Ryszard Kapuscinski > ------------------------------\----------------------------------------- > Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet) > Fax. (+44/0)1225 826449 | P.A.Keller%bath.ac.uk@UKACRL (BITNET) > ======================================================================== I'm not going to go into a discussion of why we decided to carry _item.name and _item.mandatory_code in the stand-alone definitions for each of the data items that also definted as a child in a parent tree. In fact, although I can remember the discussion about adding _item.name, I can't recall why it was necessary to add _item.mandatory_code. But we did, and the only thing I care about (at least on this list) at this point is that we do what we do consistently. The example that Peter points out (_chem_link.type_comp_1) strikes me as just being a violation of consist- ency and not a fundamental theoretical issue. I have fixed that problem, and a number of others just like it (most of which I added inadventently in the headlong rush to get things pulled together for Montreal), and have declared yet another version. [I don't want stiffle creativity on the more basic issues that Peter has raised, but I suggest that a continuing discussion along those lines is more appropriate for the DDL list than for this one]. The audit trail for the new changes: 0.7.23 1995-08-10 ; Changes (PMDF): + Changed _struct_sheet_gen.label_seq_id to _struct_site_gen.label_seq_id in _atom_site.label_seq_id tree + Removed duplicate entry of _phasing_MIR.entry_id in _entry.id tree + Removed alias in definitionof _refln.A_meas_au + Removed _item.category_id from _chem_link.type_comp_1 _chem_link.type_comp_2 _phasing_mad_clust.expt_id _phasing_mad_set.clust_id _phasing_mad_set.expt_id _phasing_mad_set.set_id _phasing_mad_ratio.expt_id _phasing_mad_ratio.clust_id _phasing_mad_ratio.wavelength_1 _phasing_mad_ratio.wavelength_2 + Removed _item_type.code from most of the above (it wasn't there in all of them). + Added _item.mandatory_code to _phasing_mir_der.der_set_id + Corrected _item.name for _phasing_mad_ratio.wavelength_2 ; Bye for now - Paula ******************************************************************************** Dr. Paula M. D. Fitzgerald ______________ voice and FAX: (908) 594-5510 Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com P.O. Box 2000, Ry50-105 ______________ or bean@merck.com Rahway, NJ 07065 USA (for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000) ********************************************************************************