(35) Mostly units and R factors

To: [email protected]
Subject: (35) Mostly units and R factors
From: bm
Date: Fri, 2 Jun 1995 12:38:26 +0100
Dear Colleagues

Apologies to anyone who received an earlier draft of this mailing - please
throw it away. Until last week, it was easy for us to trap erroneous
mailings before they left the premises. Now, with our faster, slicker 64kbps
line our mistakes slip out faster!

George Sheldrick has pointed out to me that the copy of syncif.c which
I made available for ftp has an error in it. The file appears to need an
include file "pass1.h" which is not supplied. In fact, the contents of that
file have already been included, and the offending line, which reads
    #include "pass1.h"
can simply be deleted. A slightly more up-to-date version of this program
(masquerading under its other name of vcif) can also be obtained from oour
WWW server. This allows STAR constructs like global_ and save_frames (useful
for checking dictionaries) but not nested loops. A DOS executable is
available. If you already have syncif, there is probably no need to upgrade
to this version of vcif, unless you have some free time to kill.



Agreement
=========

A34.1  Simplification of units descriptions in DDL1.4
-----------------------------------------------------
Unanimous agreement on the proposal to drop the _units_extension,
_units_conversion and _units_description terms from the published DDL1.4
specification, and to replace them with _units and _units_detail. 

However, there is some disagreement on how to handle the existing CIFs that
use datanames with appended units codes, as permitted in the original Core
dictionary. If you recall, my suggestion was

BM> Just add _cell_length_a_pm etc to the existing core (as extra,
BM> full, datanames with new definitions specifying the units - or,
BM> rather, with new _units). This preserves the integrity of any
BM> existing files (outside of the archive) which conform to core version 1991. 

Syd is more inclined to modify all the existing files in the archive to use
the unadorned datanames, and Paula is in agreement:

P> In answer to the direct request for a vote, I'm all for dropping
P> _units_extension.  I'm completely with John Westbrook on this one - the new
P> DDL provides a very elegant and comprehensive way of handling the need to
P> be able to convert to units other than the ones specified in the CIF
P> definitions.  The weakness that I see with Syd's proposal for _units and
P> _units_detail is that the only way to get to *other* units is to keep
P> defining new data names.  Otherwise it is fine for being specific about the
P> accepted CIF units appropriate to the data item.
P> 
P> I don't think that defining new data names is the way to go with fixing the
P> archive.  Certainly it would work, but it smacks of retrogression.  You
P> cite a very small handful of cases where authors have used something like
P> _cell_length_a_pm - I would suggest that the forward thinking thing to do
P> is to change the files, annotate the change, and move to a future where CIF
P> evolves in the direction of becoming more, not less, elegant.

Any other views?


Continuing discussions
======================

(34)D28.2, D28.3  R Factors
---------------------------
D>      What is needed is some system in the names that will allow us
D> to create names for R factors whose meaning will be self-evident. 
D> They might also be parsable, but that is hardly important for DDL1
D> item names.  
D> 
D>      Each of the item names has a first part which may be
D> different, e.g. _refine_ls or _pd_proc.  The systematic part would
D> follow this.  Ideally, the next part of the name would be _R_factor
D> or _wR_factor, since this then specifies that we are looking at an
D> agreement index.  This would then be followed by the nature of the
D> measurement to which the index was being applied, e.g _F, _F2, _I,
D> _prof.  If this were omitted, _F would be assumed.  The next part
D> would be _all or _obs where _all is taken as the default and should
D> never appear explicitly in any name, and _obs is not needed for
D> _wR.  Finally, we need a name to indicate that the R factor has
D> only been applied to measurements that cover a finite range of
D> sin(theta).  This is what Paula is interested in.  She defines a
D> lower and upper resolution limit, but is it not possible that
D> someone may wish to record the R factors over different ranges of
D> resolution, e.g. for low resolution only, for high angle
D> reflections only or for all reflections?  I am not sure how to
D> handle this.  Maybe Paula is right in thinking that we should not
D> define a special R factor for this purpose, but just assume that
D> when resolution limits are given, they apply to all the R factors
D> listed. 
D> 
D>      According to this scheme the possible combinations are:
D> 
D>           proposed                 current
D> 
D>           _R_factor                _R_factor_all
D>           _R_factor_obs            {same as proposed}
D>           _R_factor_F2
D>           _R_factor_F2_obs
D>           _R_factor_I
D>           _R_factor_I_obs
D>           _R_factor_prof           [_prof_R_factor]
D>           _wR_factor               _wR_factor_all and _obs
D>           _wR_factor_F2
D>           _wR_factor_I
D>           _wR_factor_prof          [_prof_wR_factor]
D>           _wR_factor_prof_expected [_prof_wR-expected]
D> 
D> [] = proposed for pdCIF
D> 
D>      Not all these definitions would be used, but at least we would
D> have a systematic way of creating a new name if the need arose. 
D> Brian T prefers to put the I, F2, prof before _R_factor, but the
D> order I propose seems more logical.  The scheme agrees with current
D> names apart from dropping _all (_wR_factor_obs makes no sense in
D> any case).  Brian's proposed names would be changed, but if he
D> feels strongly, the modifiers _I, _F2 and _prof could be placed
D> before _R.  If length of the name is a factor, the string '_factor'
D> could easily be omitted.


P> I was going to start my response to this by saving that I agreed completely
P> with George, but on reflection I realize my thinking may differ in some
P> details. My thinking on all of this has evolved since writing this part of
P> the mmCIF dictionary, and what I would now propose is the following set
P> of tightly defined R-factors:
P> 
P> ...R_factor_F_all
P>    on all reflections in the reflection list that:
P>       satisfy the resolution criteria
P>       were not manually flagged as being "unreliable" observations
P> ...R_factor_F_obs
P>    on all relfections in the reflection list that:
P>       satisfy the resolution criteria
P>       satisfy the observation criterion
P>       were not manually flagged as being "unreliable" observations.
P> ...R_factor_F_work
P>    on all relfections in the reflection list that:
P>       satisfy the resolution criteria
P>       satisfy the observation criterion
P>       were not manually flagged as being "unreliable" observations
P>       were not excluded from the refinement for the purposes of calculating 
P>         an F-free
P> ...R_factor_F_free
P>    on all reflections in the reflection list that:
P>       satisfy the resolution criteria
P>       satisfy the observation criterion
P>       were not manually flagged as being "unreliable" observations
P>       were excluded from the refinement for the purposes of calculating 
P>         an F-free
P>   
P> This essentially adds one extra definition (...R_factor_F_work) to the set
P> that we already have, and in so doing addresses George's point about not
P> rewriting history to deal with the fact that people now want to quote
P> R-frees.
P> 
P> A parallel set of data name would be present for F^2^, since George (and
P> others) are using F^2^ relatively routinely now.  Note that these definitions
P> would also exploit the resolution and observation criteria.
P> 
P> Note that this is one of the lovely feature of the aliases - we can change
P> these data names so that we have a consistent set that reflects the current
P> thinking about how to report these things, but the aliases will allow older
P> files to point to the correctly renamed new data items (i.e., ...R_factor_all
P> will point to R_factor_F_all).
P> 
P> I'm pretty comfortable with the above as a formal proposal for the rigidly
P> defined cases.  Handling's George's point (b) is a bit trickier.  One idea,
P> put forward just to stimulate discussion, is that we could use the exciting
P> wR data names, would then be used to deal with the cases of technology
P> developement, to wit:
P> 
P> ...wR_factor
P>    on all reflections in the reflection list that:
P>       satisfy the criteria outlined in _refine_ls.wR_criteria
P> ...wR_factor_work
P>    on all relfections in the reflection list that:
P>       satisfy the criteria outlined in _refine_ls.wR_criteria
P>       were not excluded from the refinement for the purposes of calculating
P>         an F-free
P> ...wR_factor_free
P>    on all reflections in the reflection list that:
P>       satisfy the criteria outlined in _refine_ls.wR_criteria
P>       were excluded from the refinement for the purposes of calculating
P>         an F-free
P> 
P> Note that no distinction is made between _obs and _all here, as the free 
P> free-text criteria _refine_ls.wR_criteria could be almost anything.  I see
P> some problems with this last part of the proposal, as we are orphaning a
P> data item (one of ...wR_factor_all, ...wR_factor_obs), but as I said, this
P> is just a proposal for discussion.


D30.2 The New DDL
-----------------
D>      In the long term it probably matters little whether DDL1 names
D> alias DDL2 names or vice versa.  It makes sense to use DDL1 names
D> in the current DDL1 files and DDL2 names in DDL2 files so that both
D> Paula and Brian use the names they feel most meaningful.  The
D> aliasing does, however, provide a way of slowly converting a DDL1
D> dictionary to DDL2.  The new names can be introduced before the
D> other features of DDL2 are in place.  Obviously there are still
D> unresolved problems with DDL2 so it makes sense to convert slowly
D> as dictionary handling tools become available.


D33.5 Units
-----------
Paula has been working with the PDB on accommodating the CIF formalism to
certain requirements of the PDB, and specifically to handling existing
structure factor depositions in arbitrary units. 

D>      There is a way out of Paula's difficulty.  It is to define a
D> _refln.F_calc_scale.  This number would be applied to _refln.F_calc
D> in order to acheive the required scaling in electrons.  If *_scale
D> was not present, the default of 1.0 would be assumed, thus assuring
D> that all existing files were compatible.  If the scale factor was
D> not known, as in the case of the sets in the PDB, then *_scale
D> could be set to ? which would indicate that a scaling (other than
D> 1.0) was needed but its value was not known.

P> On an ancillary topic, I have come to agree (reluctantly) that the only way
P> to handle my PDB problem of structure factors in arbitrary units is to define
P> a set of data names of the form _refln.F_meas_au, where au stands for
P> arbitrary units.  This is different from the above discussion, [34.1]
P> as there is no formal means of conversion between arbitrary units and
P> electrons. I am not very happy with this solution, but I don't see any
P> other way around it that is formally consistent.  Since this is a case
P> where the archive is large, and where the means of conversion is unknown,
P> I think we must define data names to manage the archive.


D33.6 Family names
------------------
D>      I can see no problem with your suggestion of incorporating the
D> dynastic modifier into the family name.  We are much more likely to
D> want to search on a person's title or given name (which are not
D> separate data items) than on the dynastic modifier per se, so if we
D> can combine titles and given name, we are not going to lose much
D> by combining family name and dynastic modifier.
D> 
D>      On the other hand is this not a good example of where we might
D> use the _type_construct:
D> 
D> '(_family_name)( _dynastic_modifier)?( _title)?( _given_name)'
D> 
D> or is this getting too complicated?




Regards
Brian
Prev by Date: (34) _units simplification; history of DDL
Next by Date: (36) Comings and goings; IUPAC formula; H bonds; _type_construct
Index(es):
- Date
Discussion List Archives

(35) Mostly units and R factors