(68) pdCIF categories still lack proper keys

To: [email protected]
Subject: (68) pdCIF categories still lack proper keys
From: bm
Date: Fri, 23 May 1997 16:21:46 +0100
Dear Colleagues

Thanks to those of you who have already registered your votes for the two
dictionaries under review. To those who have not yet, please consider an early
response for either or both cases if you have already made up your minds. I
remind you that you may approve each dictionary independently of the other.

D61.1. pdCIF categories
-----------------------
Paula's critique of the handling of categories in pdCIF has already brought
some recognition of a need to clean up certain minor anomalies, and this
will be attended to in the final working of the pdCIF dictionary.  Two
issues remain which bother her, and which she has posted below.

P> The discussion of the powder dictionary in its current form has focused on
P> two issues - the naming conventions, in which data names do not necessarily
P> begin with the category name, and the loose category construction of the
P> dictionary.
P> 
P> Let me begin by saying that while I personally don't like the practice of
P> data names not beginning with the category name, and while I feel it does
P> a great disservice to the user community (who are never going to understand
P> why _pd_meas_intensity_total cannot appear in the same loop as 
P> _pd_meas_rocking_axis - they can't because they are in different categories)
P> at the end of the day I consider this a question of style, and not one on 
P> which I am willing to be resolutely dogmatic.

On this I think we would agree that the '.' notation in DDL2 does clarify
the taxonomy of categories, but the shortcoming Paula identifies is also
apparent in the core dictionary, where the data name alone doesn't give
unambiguous information about whether or where the data should be located.

P> The category issue is something else again.  I will preface my remarks with
P> quotes from Brian Toby and Brian McMahon.
P> 
P>BT> I have no quibble with categories. They could serve a valuable purpose.
P>BT> I object to the restriction on mixing categories in a loop since it either
P>BT> makes categories useless, as Paula so correctly points out for pdCIF,
P>BT> or requires very complex dictionaries with lots of inter-loop pointers and
P>BT> professional computer programmers to create software.
P> 
P>BM>                                                  It seems to me that Brian
P>BM> T. is adopting the role of the man with the pencil; the mmCIF requirements
P>BM> are those of a multinational fruit wholesaler. Nothing I've seen so far
P>BM> convinces me that it would be death to the CIF effort to proceed with a
P>BM> DDL1.4 pdCIF dictionary - essentially the one we now have - and migrate
P>BM> later to a DDL2 formulation if it's demonstrated to be necessary.
P> 
P> My bottom line here is that despite Brian McMahon's folksy greengrocer
P> analogy, and despite Brian Toby's objection to the restriction on mixing
P> categories within a loop, we are left the the fact that this is a restriction
P> of DDL1.4, and not a requirement of DDL2.  I quote the definition of category
P> in DDL1.4:
P> 
P> data_category
P>    _definition
P> ;           Character string which identifies the natural grouping of data
P>             items to which the specified data item belongs. If the data
P>             item belongs in a looped list then it must be grouped only with
P>             items from the same category, but there may be more than one
P>             looped list of the same category provided that each loop has its
P>             own independent reference item (see _list_reference).
P> ;
P>     _name                      '_category'
P>     _category                    category
P>     _type                        char
P> 
P> In the absence of _list_reference specifiers for each of the categories that
P> are in dispute, I simply don't think that the powder dictionary as it now
P> stands is DDL1.4 compliant.  If Brian Toby can provide consistent list
P> identifiers for each of the looped data items in the PD_DATA and PD_INSTR
P> categories, then I will be willing to approve the dictionary.  Failing that,
P> I will have to vote no.

The issue here is that the pd_data category permits the collection of data
points in an experiment (raw and processed) to be tabulated in one or
multiple tables, depending really on whether there is a direct relation
between each raw and processed point. If such a relationship exists for all
(or almost all) points, then a tabulation like this is appropriate (I
borrow Brian T.'s example from circular 66):

loop_   _pd_meas_angle_2theta
        _pd_meas_counts_total
        _pd_proc_intensity_net
        _pd_calc_intensity_net
                 10     131   101   100
                 10.05  127    97   100
                 10.1   153   128   150

whereas, if there is a more sparse relationship, separate tables are
appropriate:

loop_   _pd_meas_angle_2theta
        _pd_meas_counts_total
                   10     131
                   10.05  127
                   10.1   153


loop_   _pd_proc_2theta_corrected
        _pd_proc_intensity_net
        _pd_calc_intensity_net
                   10     101   100
                   20      21    32

In this listing, the human reader infers that the two rows with
_pd_meas_angle_2theta equal to 10 and _pd_proc_2theta_corrected equal to 10
relate to the same data point; but there is nothing explicit in the
machine-readable dictionary formalism to allow software to make this
identification. The two loops should each have their own _list_reference
identifiers. In this example that could be _pd_meas_angle_2theta and
_pd_proc_2theta_corrected respectively, and the two shoule be related as
parent/child (compare how the core dictionary handles the _atom_site_
and _atom_site_aniso_ components of the ATOM_SITE category). Since Brian
may wish to use either of the 2theta angle identifiers in an amalgamated
table, or since there might be other identifiers for the data point
collected, it might be necessary to introduce additional arbitrary codes
(say _pd_data_id, _pd_meas_id, _pd_calc_id) whose sole function is to link
and identify points that match across multiple tables.

Note that in DDL1.4 a category may be described without a reference item
(see REFLNS_SHELL in the core for an example); but if the category may be
split across multiple tables, the _list_reference identifiers *must* be
provided to relate the entries in the separate tables to each other, as
Paula points out by drawing the definition of _category to our attention
above.

Regards
Brian
Prev by Date: (67) Call for votes on pdCIF and mmCIF
Next by Date: (69) mmCIF dictionary approved
Index(es):
- Date
Discussion List Archives

(68) pdCIF categories still lack proper keys