(66) Further thoughts on pdCIF categories

To: [email protected]
Subject: (66) Further thoughts on pdCIF categories
From: bm
Date: Mon, 19 May 1997 17:53:58 +0100
Dear Colleagues

D61.1. pdCIF categories
-----------------------
(The unattributed comments in this first part are Brian Toby's reaction to
David Brown's critique.)

D>      As far as I can make out, Brian T does not believe in
D> categories, never has and sees no reason to change now.

I have no quibble with categories. They could serve a valuable purpose.
I object to the restriction on mixing categories in a loop since it either
makes categories useless, as Paula so correctly points out for pdCIF,
or requires very complex dictionaries with lots of inter-loop pointers and
professional computer programmers to create software.

D> different times.  But what about the case where Brian and his
D> students work on the file on a number of different occasions and
D> then I subsequently make my own contribution?  His flexible
D> structure does not allow for this possibility.  Either all authors

Yes, I had thought about this. Then the only solution is to assign a date to
every person in the loop with the implicit connection made that people who did
work at the same time must have worked together. Not elegant, but functional.
The alternative is to define a new set of loops to differentiate between
collaboration and sequential processing. This will probably require multiple
loops and pointers between loops.  I would prefer that we wait until there is a
demonstrated need for this level of structure.

D> DATA NAMES
D>      I see no particular reason why all the datanames in the pd
D> dictionary need to start with _pd_.  We have not adopted this
D> convention in any other dictionary and I do not see that it offers

I don't like the _pd_ prefix and if we are going make a revision of the pdCIF
names to match categories, I would like to discuss dropping the prefix. On the
other hand, I would prefer to see the dictionary approved in its current form.
Syd has convinced me to live with the prefix.

D> _PD_DATA
D>      My vote is to divide this into two categories, _pd_meas and
D> _pd_proc.  There should be no need to have pointers between them
D> since there may not be a one-to-one correspondence between them.
D> _*_2theta should serve to connect the information in the two
D> categories.  If it makes sense to list these two sets of numbers
D> together this can surely be done by the software.  We should not
D> confuse the cif with the output produced by the software, just as
D> we should not necessarily confuse the structure of the cif with the
D> structure of the database into which the cif is to be copied (as
D> Brian points out).  Cifs are for the transfer and archiving of
D> information, not for providing a convenient layout for research or
D> publication.

I would have no problem with this as long as it is considered "proper CIF"
to create a file that looks like this:

loop_   _pd_meas_angle_2theta
        _pd_meas_counts_total
        _pd_proc_intensity_net
        _pd_calc_intensity_net
10     131   101   100
10.05  127    97   100

where it is not necessary to include _pd_proc_2theta_corrected if it is
identical to _pd_meas_angle_2theta. Alas, _pd_meas and _pd_proc cannot appear
in a single loop. Requiring the following structure only makes CIF less
transparent, except perhaps to the database folks:

loop_   _pd_meas_angle_2theta
        _pd_meas_counts_total
10     131
10.05  127
...

loop_   _pd_proc_2theta_corrected
        _pd_proc_intensity_net
        _pd_calc_intensity_net
10     101   100
10.05   97   100
...

D>      Why is _pd_proc_2theta_range_*  needed in the _pd_data category
D> since it is not looped with the profile points?  I am also puzzled
D> to know how _*_range_  fields are used to describe fixed-angle
D> profiles.  Does one use _*_min or _*_max to give this angle or is
D> it necessary to set both equal to each other and set _*_inc to 0.0?
D> This seems rather complicated.  Why not a _pd_proc_2theta_fixed
D> field as is done with _pd_meas?

It is OK (acronym definition unknown) with me to move _pd_proc_2theta_range_ to
a different category, but I still do not see what the advantage is in placing
_pd_proc_2theta_range_ in a different category from _pd_proc_2theta_corrected
when they specify the same information.

For stationary detectors one uses _pd_meas_2theta_fixed and there is no need for
a _pd_proc_2theta value (perhaps a zero correction in _pd_calib_2theta_offset)

D> _pd_calib_std_external_id
D>      Shouldn't this be called _pd_calib_std_ext_block_id since it
D> contains the name of a datablock, not the name of a link to another
D> part of the same datablock?


This is an excellent suggestion.

Brian T.

-----------------------------------
Herbert Bernstein has sent to me the following comments:

H>                                                   ... I agree very much 
H> that the nub of the issue is whether categories are to be taken seriously 
H> or not.  I hope there will be an effort at promoting use of categories.
H> It is not just a database issue.  It is also very much an issue of 
H> creating a well-organized interchange format that can be used effectively 
H> by both people and software.  Consistency and good style reduce the chances
H> of error by either one.  That is what I like about mmCIF and the new core.

I am becoming increasingly confused over where the present discussions are
taking us (if anywhere). Surely the fundamental difference is over the
degree to which the mmCIF and pdCIF data models are 'normalised', in the
sense in which the term is used in relational databases to describe the
homogeneity of entries in a table. The mmCIF (DDL2) model is well
normalised: there is a table for apples, a table for pears and a table for
oranges. The pdCIF model is less clean; it has a table for "fruit", in which
properties of apples, oranges and pears may be deposited willy-nilly. But
there is nothing in relational database theory to prescribe one view as
fundamentally more "correct" than another. The professional fruiterer may
find the distinct apple/orange/pear tables essential for his business; the
general stores manager may be able to make do with a "fruit" table and a
scrap of paper and pencil tucked behind his ear. It seems to me that Brian
T. is adopting the role of the man with the pencil; the mmCIF requirements
are those of a multinational fruit wholesaler. Nothing I've seen so far
convinces me that it would be death to the CIF effort to proceed with a
DDL1.4 pdCIF dictionary - essentially the one we now have - and migrate
later to a DDL2 formulation if it's demonstrated to be necessary.

The powder community may not be able to validate their pd data files against
the dictionary using John W.'s software, but they will be able to write
their own validator or use CIFtbx or Paul Edgington's new HICCUP software.
mmCIF tools will not be able to validate a pdCIF data file against the pdCIF
dictionary; but they can validate the embedded core data names (through
aliases) against the DDL2 version of the core dictionary, and accept the
_pd_* tokens as effectively local datanames.

My instinct is still to approve the dictionary now, so that people can begin
archiving data sets today, send powder papers to Acta today. Recall that we
have some 7000 data sets at the Acta offices in DDL0 format; ingenuity and
hard work have ensured that they are accessible to the DDL2 software; and I
am sure that a little more ingenuity could build a rules-driven normalisation
converter from DDL1 format pdCIF data to DDL2 format if there were
sufficient need to do so.

I would prefer to see us commit to this approach and deliver a
well-documented DDL1.4 dictionary (by which I mean that the documentation
will include specific directives such as "category assignment of datanames
must be derived from the dictionary _category value, and not inferred from
the structure of dataname tokens") by this summer's ACA and ECM meetings,
rather than have to report that the powder project has gone back into limbo.

Regards
Brian
Prev by Date: (65) more on pdCIF categories
Next by Date: (67) Call for votes on pdCIF and mmCIF
Index(es):
- Date
Discussion List Archives

(66) Further thoughts on pdCIF categories