Discussion List Archives

[Date Prev][Date Next][Date Index]

(6) procedural matters, restraints, dictionary introductions

  • To: COMCIFS@uk.ac.iucr
  • Subject: (6) procedural matters, restraints, dictionary introductions
  • From: bm@uk.ac.iucr (Brian McMahon)
  • Date: Mon, 11 Oct 93 12:14:03 BST
Dear Colleagues

Now that our discussions are off the ground, I suggest that these reports
will go out less frequently - say once a week, if there is sufficient
volume of correspondence to justify this - but with the option of more
frequent digests on matters of grave urgency or controversy.

I shall be away from the office 15-31 October inclusive, so urgent 
communications should be routed via David during that time (ideally with 
copies to me if appropriate). I shall try to be back on air early November
with an account of the macromolecular CIFtools workshop that's happening
next weekend.

D5.1 Procedural matters
-----------------------
Following David's suggestion of indexing discussions by reference number,
Paula has a query:

P> With respect to (5)D5.1, I think this indexing is a good idea for keeping the
P> discussion ordered, but I am a tad confused about just how to use this.  For
P> instance, if I want to talk about David's comments in your message number
P> 5 about item 4.2 (_intro sections) and not about the 4.2 discussion itself,
P> does a syntax like the one I used above (5)D4.2 point the reader in the
P> correct direction?  This could of course get easily out of hand, and I
P> am *not* trying to make it more complex than it already is.

This seems a logical and reasonable extension when appropriate. Where the
thread of a discussion flows smoothly, it won't be necessary to refer back
to the location of every relevant comment; but where a specific point several
circulars back needs a reference, this sounds suitable. Comments?

D4.1 Restraints/constraints
---------------------------
George has protested that his initial query about the proposed treatment of
refinement restraints in the macromolecular dictionary was a simple request
for information. Paula has since sent him the meanings of the codes currently 
listed under _refine_ls_restr_type (and I can make those available to anyone
else who is interested). As a background to the treatment of this problem in
the macromolecular dictionary, Paula also adds:

P> I take Syd's point that the CIF definitions will last much longer than the
P> computer programs that we now use to do our refinements - hence his continual
P> reminder that we must not allow ourselves to get locked into data items that
P> may not be around five years from now.
P> 
P> However, we also have a task to allow the complete archiving of a macromole-
P> cular structure determination, and using today's technology we can't know
P> what was done without a statement about the restraints that were used. 
P> It is common practice to publish such a table when reporting a structure
P> refined using Protin/Prolsq.  One of the reasons that people remain somewhat
P> uncomfortable when assessing structures refined with X-plor is that it is
P> neither common practice there, nor it is easy to even know what the
P> restraints that were used were (even if you were the one running the
P> calculation).
P> 
P> As a starting point for discussion, what we decided to do in the mm extension
P> dictionary was a compromise.  Sensitive to Syd's point about not being locked
P> into today's technology, we decided not to make the routinely used
P> Protin/Prolsq restraints separate data names - we didn't even feel that it
P> was safe to make them enumeration values.  But we did want to have people
P> use the same P> names (and spellings) when they archived Protin/Prolsq
P> refinements, so we put them into examples.  This has many disadvantages,
P> most especially lack of the ability to validate.  But as I said, this was
P> meant to be a starting point for discussion - it just seems that now is the
P> time we are getting to the discussion.

D4.2 Dictionary introductions
-----------------------------
P> I think the idea of using _[], and _[mm] etc for the
P> introductory sections is a great one - it keeps these items at the top of
P> their sections in the dictionary and it is short (a very important considera-
P> tion for those of use struggling all the time with 32 characters.) 
P> 
P> However, I have to agree with David that I don't quite understand the point
P> about the _name's being the same and having a trailing underscore.  This is
P> not to say that I disagree - I just simply don't understand the point of
P> this. Can someone please explain.
   
S> My only significant comment is about your suggestion
S> of category summaries to appear as ..._[] for core and ..._[xx] for
S> extension dictionary xx. So far so good. It is the _name construction
S> that worries me. Let me explain why. Programs like CIFtbx are not at
S> all concerned about duplicate data block names because they do not
S> attempt to either validate or "access" the dictionary files; they
S> simply extract the "_names"s (and attributes) and place them in a 
S> common list. At the moment these programs (CIFtbx and Cyclops) would
S> fail because of the identical names in ..._[] and ..._[xx]. BUT this
S> is because they do not currently look at "_type null" and omit these
S> names from the check list. The thing to keep in mind is that really
S> programs like Cyclops should be devoid of this type of higher logic
S> because they are essentially spelling checkers and it is handy to
S> have Cyclops check a dictionary against itself (.._[] and all)! So
S> my query is why not put the data block name into _name as well -- I
S> really do not see the benefit of doing otherwise?? Am I missing something
S> obvious? 
S> 
S> OK, what I think is better is:
S> 
S> data_atom_sites_[]             
S>     _name      '_atom_sites_[]' 
S>     _category    dictionary_definition
S>     _type        null
S> 
S> data_atom_sites_[mm]             
S>     _name      '_atom_sites_[mm]' 
S>     _category    dictionary_definition
S>     _type        null

The philosophy behind my suggesting that _name should be '_atom_sites_' (or
whatever) in both cases is this: the new scheme uses a convention of datablock
*naming* to organise information in the dictionary. The _name is, in general,
a valid dataname that may be used in the CIF (this is coming back to David's
point about authors getting confused about the legitimacy of including these
sections in CIFs as valid data items). "_atom_sites_[]" is NOT a valid data
name. Of course, neither is "_atom_sites_". However, I felt it a more natural
extension to allow "_atom_sites_", which is a valid name *root*, against
"_atom_sites_[]" which is just a label of convenience. The argument would have
more weight, perhaps, if categories could be uniquely assigned from their
lexicographic roots, but they can't.

Consider also the powder dictionary, where everything has an identifying
prefix already. Which is preferable aesthetically:
     data_pd_calc_[pd]       _name _'pd_calc_'
or
     data_pd_calc_[pd]       _name _'pd_calc_[pd]'             ?

This isn't something I feel very strongly about, but it might be useful to
discuss this point with the people at next week's workshop, who are already
engaged in building tools to read and parse DDL dictionaries. I'll report
back on any relevant remarks there. I wouldn't want to make Syd have to
change his programs gratuitously (but I have to confess that I have just
discovered that CYCLOPS falls over anyway with the *_[] convention!).

--------------

On a slightly different issue,

P> As to whether these data names should be in a separate dictionary, I really
P> think that is a bad idea (even though I seem to have enumerated it as a poss-
P> ibility).  The point of having explanatory information in the dictionary in
P> the first place was to make the dictionary more understandable WITHOUT REFER-
P> ENCE TO ANOTHER DOCUMENT.  It will be a pain to keep this all consistent
P> between the core and extension dictionaries, but we can always build a
P> private tool (along the lines of David's suggestion) to help us with this.

I'm broadly in agreement with this. It's useful to have the dictionaries as
single, essentially self-contained documents. With appropriate organisation,
either approach could be taken, but the maintenance of examples within the
current dictionary file does seem easier to keep track of.

P> Feeling that consensus is emerging on this issue, can I ask for some specific
P> implementation advice (back to where this whole discussion began). I propose
P> to do the following in the mm extension dictionary 
P> 
P>  1)  replace _appendix with _[mm] throughout
P>  2)  continue to use this data item for
P>      a) an overall definition of the category
P>      b) one or more examples of a mmCIF entry in this category
P>      c) a listing of data items in the same category that are present in the
P>         core (and therefore available to users of the extension dictionary)
P>   
P> On these two items I feel comfortable.  I am looking for advice on 3)
P>   
P>  3)  remove the newly renamed _[mm] data items for those categories for which
P>      there are not mm extension definitions and where an example would not be
P>      relevant.  
P>   
P>      I am thinking here specifically of the _chemical data items.  In the
P>      mm dictionary we have the much more elaborate _entity category, and we
P>      do not use _chemical at all.  We are currently only providing an
P>      (newly renamed) _[mm] data name in the _chemical category in order to
P>      explicitly say that mm CIFs will not in general use _chemical.
P>
P>      There may be other _categories where this is relevant, but this is the
P>      only one that comes to mind without rereading the dictionary right now.

Any volunteers for help on this? (If none are forthcoming, volunteers will be
sought off the main discussion list!)

Regards
Brian