Discussion List Archives

[Date Prev][Date Next][Date Index]

(65) more on pdCIF categories

  • To: COMCIFS@iucr.ac.uk
  • Subject: (65) more on pdCIF categories
  • From: bm
  • Date: Fri, 16 May 1997 16:43:32 +0100
Dear Colleagues

I include today David Brown's latest contribution.

D61.1. pdCIF categories
-----------------------

D>      As far as I can make out, Brian T does not believe in
D> categories, never has and sees no reason to change now.  Paula on
D> the other hand is convinced that categories properly used is the
D> only way one can produce a precise and flexible arrangement of the
D> file, even if it does require extra datanames and pointers.  We
D> clearly have to decide whether we want to take categories seriously
D> or to regard them as an unfortunate historical intrusion into an
D> otherwise simple flat file structure.  I sense that ghosts are
D> being raised and we are revisiting discussions that were in some
D> sense resolved before I joined comcifs.

Well, I think the debate about this issue has being going on throughout the
last several years, but much of it has been informally at meetings rather
than through the COMCIFS mailings.

D>      Brian T raises an example which serves well to illustrate the
D> differences between Brian's and Paula's philosphies.  In the
D> category _pd_proc_info_*  Brian shows that on some occasions _authors
D> names may need to be looped with _datetime to indicate that
D> different people worked on the file at different times.  On other
D> occasions _datetime would be looped separately to indicate that the
D> same people worked together on the file but at a number of
D> different times.  But what about the case where Brian and his
D> students work on the file on a number of different occasions and
D> then I subsequently make my own contribution?  His flexible
D> structure does not allow for this possibility.  Either all authors
D> and datetime are looped together, in which case Brian and his
D> students are only allowed one _datetime, or authors and datetimes
D> are listed in separate loops, in which case it is impossible to
D> tell who worked on the file at which time.  Thus Brian's structure
D> is unable to provide the very flexibility that he is arguing for. 
D> The only way that flexibility can be restored is either by
D> abandoning the rule that datanames cannot be repeated in the same
D> datablock, or by imposing a more strict structure on the cif. 
D> Given that we have already accepted a certain level of organisation
D> for cifs (which Brian T may think was a mistake from the beginning)
D> the only route open is to accept the inevitability of a more
D> structured approach, or to backtrack to a completely flat file and
D> do away with categories entirely.  With the rules we have already
D> in place, the flexibility that Brian so values turns out to be best
D> achieved by adopting a strict syntax.  If _datetime and _author
D> were kept in different categories with pointers, it would be
D> possible to identify precisely which authors worked at which times
D> and therefore which authors worked together and which separately.
D> 
D>      Here are a few other comments.
D> 
D> DATA NAMES
D>      I see no particular reason why all the datanames in the pd
D> dictionary need to start with _pd_.  We have not adopted this
D> convention in any other dictionary and I do not see that it offers
D> any advantages.  It would make more sense, for example, for those
D> datanames that belong to the refln_ category to match the names in
D> the core dictionary.  It may even make sense to add them to the
D> core. 

I have in front of me a draft symcif dictionary prepared by I.D. Brown,
proposing a couple of dozen data names all beginning "_symmetry_"
and in categories all beginning "SYMMETRY_" :-)

In the DDL1 world, it's OK (if inelegant) to have a mismatch between the
category names and the dataname tokens; this is proposed in pdCIF and
implemented in a few datanames that are effectively defunct in the core,
such as _diffrn_radiation_detector. It wouldn't at all be OK in a DDL2
dictionary, but the problems in porting pdCIF to DDL2 go deeper than the
name prefix. There are some pragmatic advantages to having a unique prefix;
our plan at Acta was to switch the request list applied to a paper to a
powder-specific one if any datanames beginning "_pd_" occurred in the file.

D> _PD_DATA
D>      My vote is to divide this into two categories, _pd_meas and
D> _pd_proc.  There should be no need to have pointers between them
D> since there may not be a one-to-one correspondence between them. 
D> _*_2theta should serve to connect the information in the two
D> categories.  If it makes sense to list these two sets of numbers
D> together this can surely be done by the software.  We should not
D> confuse the cif with the output produced by the software, just as
D> we should not necessarily confuse the structure of the cif with the
D> structure of the database into which the cif is to be copied (as
D> Brian points out).  Cifs are for the transfer and archiving of
D> information, not for providing a convenient layout for research or
D> publication.
D> 
D>      Why is _pd_proc_2theta_range_*  needed in the _pd_data category
D> since it is not looped with the profile points?  I am also puzzled
D> to know how _*_range_  fields are used to describe fixed-angle
D> profiles.  Does one use _*_min or _*_max to give this angle or is
D> it necessary to set both equal to each other and set _*_inc to 0.0? 
D> This seems rather complicated.  Why not a _pd_proc_2theta_fixed
D> field as is done with _pd_meas?
D> 
D> _pd_calib_std_external_id
D>      Shouldn't this be called _pd_calib_std_ext_block_id since it
D> contains the name of a datablock, not the name of a link to another
D> part of the same datablock?



D65.1. Acronyms
---------------
D> BTW
D>      What does this mean (WDTM)?  Brian Toby Winces?  Better Try
D> Windows? Please can people define the acronyms they use!  Or
D> perhaps it was just a glitch in my email that inserted these
D> characters into the middle of Brian T's message (:->)

By The Way. Try http://users.ox.ac.uk/~univ0155/tla.html for one list of
essential Internet acronyms (IIRCAIWITPATT).

Enjoy the weekend.
Brian