(23) Review; diffraction standards

To: COMCIFS@uk.ac.iucr
Subject: (23) Review; diffraction standards
From: bm@uk.ac.iucr (Brian McMahon)
Date: Tue, 21 Jun 94 13:47:24 BST
Dear Colleagues

Note that there is one new topic for discussion, D23.2, at the tail of this
posting (D23.1 is assigned to a running thread that I've promoted to its
own topic heading). The rest is an appetiser for next week's meeting, and
a reminder of where we were! David has produced an excellent summary of the
current state of our affairs, which I append below. By the way, Brian Toby
has also been busy on refinements to the powder dictionary during the past
few months. Brian has just become a father, I understand (many
congratulations!). This may slow down his work on the dictionary. On the
other hand, most of the first version of ciftex was written in the early
hours of many a morning with my first-born on my shoulder, so perhaps this
may be the precursor of a burst of (exhausted) productivity.

> 	I have finally managed to find time to review where we are at in 
> comcifs and I enclose a summary.  Of course we have two dictionaries out 
> for comment, and the powder dictionary is presumably nearly ready to be 
> put out on probation as well.
> 
> 	Other items from our previous discussions are as follows:
> 
> Items in which we should declare formal acceptance of a decision
> ----------------------------------------------------------------
> 4.2 	Introductions

This is formally closed, with the decision outlined in circular 17 of 
21 December 1993.
 
> Items that can be closed without any decisions
> ----------------------------------------------
> 10.2 	privileged constants
> 17.1 	copyrights
> 21.1/2 	draft dictionaries

Note regarding 17.1: the Executive Committee has given its approval for
publication of the draft statement of policy circulated in message 17. This
(slightly modified) will go into the next available issue of IUCr Newsletter.
 
> Items that belong to STAR and therefore are beyond our jurisdiction.  
> These should also be closed
> ----------------------------
> 10.1 			Changes to STAR
> 10.3 			global_

Note, though, that global_ is permitted in STAR Dictionaries, and currently
exists in the draft dictionaries. We may choose to drop this usage from
the CIF Dictionary set, and should discuss this before publication.

> 10.4, 15.1, 18.1	data_types
> 10.5 			categories
> 13.1, 17.2, 21.7 	use of 'include', but have we decided how a file name
> is to be defined when given in 'include' in a cif (or is this dataname
> only allowed in dictionaries? 

No, _include_file may (uniquely) appear in dictionaries and in data files.
The issue of identifying the included file is, however, still an open one, I
believe. For files to be included in CIF dictionaries, Syd's 'barebones'
naming scheme may well suffice; but for arbitrary files, it may be worth
looking at the conventions used in WWW (which allow applications to locate
files referenced from within documents, whether they reside on local or
remote file systems).

> Item that should remain open pending future developments
> --------------------------------------------------------
> 16.1			esd versus su
> 
> External relations (do we need to do anything about these?)
> ------------------
> 20.1 	FAQ
> 20.2 	MIME

I don't think we need take any action on these as a Committee, but I will
keep people posted if there are any developments. I should be quite
interested personally in becoming involved with these topics.

> Other items that remain open
> ----------------------------
> 4.1	restraints, or is this solved with the introduction of 'include'?

I should say that there is something of a consensus that Paula's method of
handling recognised restraint types (through enumerations listed in an include
file) will get us over the short-term hurdle of storing some information
about the way in which specific programs have tried to impose restraints and
constraints. We may be able to close off this particular discussion thread.

> 11.1/2 naming of datablocks and files - this is incorporated in cifdic.pd
> 15.1	assigning of standard prefixes
> 20.2	the assigning of 'foreign_' prefixes
> 21.3	different types of enumeration ranges
> 21.4	occupancies (can this be incorporated into the current core 
>       dictionary?  Some discussion of this point is welcome)
> 21.5	non-crystallographic symmetry (for incorporation in current mm 
> dictionary?)
> 21.6	The use of 'other' in enumerations to allow for otherwise 
> non-legal procedures, etc.

I shall try to reorganise discussion on these open issues after the ACA
meeting. In the meantime, David had posted a few comments on some of the
later items, which I record here:


D23.1 Labelling of symmetry operators
-------------------------------------
This arose from D21.2, but should probably be promoted to a specific item
for debate and resolution. The proposal is to add to the _sym_equiv_
category a _sym_equiv_posn_id field, so that symmetry codes of the form
2_454 can be parsed effectively; the '2' in this example would match a value
of '2' in _sym_equiv_posn_id, rather than (as now) being calculated as the
second element in the list of _sym_equiv_posn_as_xyz.

D> D21.2 Having been exposed now directly to Phil Bourne and his computer 
D> colleagues I have a much better appreciation of his concerns.

(Oh dear. The zeal of the newly converted :-).)

D> My suggestion (made independently by Paula) that we need to label the 
D> symmetry operators would require a scheme that is now well embedded into 
D> the CIF definitions for cross referencing loops, namely the use of labels 
D> whose parent is defined in the _symmetry_as_xyz loop and whose children 
D> appear in bond lengths.  It would probably also require splitting the 
D> *_site_symmetry_ labels to make them more directly parsable (with some 
D> considerable increase in the number of data items in the geom loops) but 
D> this would be in the direction that we should be moving and will not 
D> present a problem to the user since all this stuff will be created by the 
D> program anyway and it will only marginally make the CIF less humanly 
D> readable.  The problem is that the _site_symmetry_ labels are in 
D> principle parsable at the moment, but there is nothing in the dictionary 
D> that allows a computer to do this.  This is clearly contrary to our 
D> philosophy.  The dictionary should contain all the knowledge of 
D> crystallography that a computer needs to know in order to interpret the CIF.

My problem with introducing a new _sym_equiv_posn_id field is that many
existing CIFs lack it. One can ensure compatibility by stating a rule that 
the '2_' matches the *_posn_id if it exists, but needs to be calculated as
before if it doesn't. But if you need to have the facility for calculating it
anyway, what is to be gained from adding the *_posn_id dataname?

One could build a new set of datanames which include the separate components
of the symmetry labels, and have the *_posn_id (or equivalent) as a loop
identifier for future use; but the future parsing tool still needs to have
knowledge of how to handle the old _sym_equiv_ datanames built into it.

On the question of parsing the existing construct, we might have a type 
(or _type_extension) of 'symcode'. I should suppose that new _type_conditions
can be formally described - perhaps within the Dictionary, perhaps alongside
it - and such descriptions will form part of the Standard, accessible to, and
used by, all software developers.


D21.3 Enumeration ranges
------------------------
D> I agree with your classification of numerical enumeration 
D> ranges.  I now understand why we should have a type of 'integer' and 
D> would campaign for this, except this is beyond CIF's jurisdiction and the 
D> enumeration of types I understand has now been decided for all time (or 
D> has it?).  I am not sure how one indicates fuzzy enumeration ranges, but 
D> I am in favour of the concept.

D21.6 Use of 'other' in enumeration lists
-----------------------------------------
D> This again raises the question of enumeration ranges but for 
D> character strings, and raises the question, not fully resolved earlier 
D> (see 13.1) of whether one can extend these enumeration ranges without the 
D> consent of COMCIFS.  If not, then they become part of the dictionary, but 
D> if so, then why do we have enumerations for these data-items?  Perhaps 
D> the answer is to have an enumeration list that includes 'other - please 
D> specify'

D21.7 _include_file in dictionaries
-----------------------------------
D> I am glad to see that dictionaries now can use _include_file.  
D> Perhaps we need to introduce this also into CIF.  I have a couple of 
D> minor problems.  What do we do with further extensions - presumably
D> cif_core_ext_2.dic etc.  The second arises from the fact that these names 
D> cannot be used as the actual file names.  My little PC only accepts names 
D> with 8 characters, so I already have a problem in keeping all my 
D> dictionaries in order.  Somewhere there has to be a thesaurus of 
D> equivalent local names.  That I guess is the programmer's problem. I see 
D> no difficulty in establishing a star_core.dic
D> 
D> 	One question that occured at Monterey is: how does a parser know 
D> which dictionaries it should be consulting in order to read a given star 
D> file?  At the moment we do not include a pointer to cif_core.dic in a 
D> cif, but presumably we should.  This would in turn point to all the other 
D> relevant dictionaries.  But the question was raised of a combined star file 
D> that might use both cif_mm.dic and nmrif_core.dic.  The NMR databank and 
D> PDB both seemed to be anxious to use the same concepts to describe the 
D> chemistry of proteins so there will be a large amount of overlap and one 
D> can envisage files that draw on different dictionaries.


D23.2 Description of diffraction standards
------------------------------------------
D> 	I have just had a call from Alan Pinkerton seeking clarification 
D> on what should be placed in the _diffr_standards_decay_% field.  The name 
D> suggests to me that if you plot all the intensities of the standard 
D> reflections against time, and draw a (straight) best fit line throught 
D> these points, this field should contain the % by which the right hand end 
D> of the line differs from the left hand end, i.e. that this field will 
D> contain 0 if the intensities scatter around a mean that does not change 
D> with time, but might be say -10 for an experiment in which the crystal 
D> shows some deterioration.  Conceivably it might read +5 if the x-ray beam 
D> tended to increase with time.
D> 
D> 	However, if you read the description, this simple interpretation
D> becomes muddied.  It says the field contains the 'percentage variation in
D> the mean intensity for all standard reflections'. Now the mean intensity is
D> presumably a single value and so shows no variation unless the mean is not
D> a global mean but a mean of some sub groups of intensities.  What is this
D> subgroup?  A single intensity from each standard measured during the same
D> time interval?  The intensities of a single reflection measured during a 
D> longer time interval, but an interval small with respect to the total time
D> needed to record the D> diffraction pattern?
D> Or does 'variation' mean variance or perhaps the esd
D> of the whole set of measurements?  Even if we take the interpretation
D> presented in the previous paragraph, it is not clear how the decay is to
D> measured.  I have described one interpretation, but Alan suggested that
D> the one he prefers is related to the actual corrections applied by the
D> program (is this related to _diffrn_standards_scale_sigma?) He proposes to
D> take the largest correction, subtract the smallest correction, divide the
D> difference by the largest correction and express the quotient as a
D> percentage.  The result is not the same as one would get from my first
D> interpretation.  Anyway, what is an 'individual mean standard scale'
D> described in *_scale_esd?  The wording needs clarifying.  Alan proposes
D> that we introduce _diffrn_standards_correction_max and *_min whose meaning
D> should be self-evident from the name and which would express exactly the
D> concept that we are trying to convey.  Is this a change we should
D> incorporate into the new core version? 

Regards
Brian
Prev by Date: (22) For information: NMR Information File
Next by Date: (24) Notice of circulation of draft powder dictionary
Index(es):
- Date
Discussion List Archives

(23) Review; diffraction standards