(28) Definitions and redefinitions; unweighted R(F^2^)

To: [email protected]
Subject: (28) Definitions and redefinitions; unweighted R(F^2^)
From: [email protected] (Brian McMahon)
Date: Thu, 13 Oct 94 11:57:46 BST
Dear Colleagues

I should like to introduce two new Consultants to our roster. They are:
     Eldon Ulrich, University of Wisconsin-Madison ([email protected]),
          who is working at BioMagResBank on the development of an NMR
          information file (NMRif);
     Gotzon Madariaga, U. Pais Vasco, Bilbao ([email protected]), who
          has produced a draft CIF Dictionary for the description of
          incommensurate modulated crystal structures, based in large part
          on the checklist edited by the Commission on Aperiodic Crystals.
Welcome!

I need once again to apologise for the delay in producing this communication,
and thank you for your patience in respect of the speed (or lack of) with
which many important matters are being handled. Many of you will already know
about the developments proposed for the handling of data dictionaries at last
weekend's workshop in Brussels, and I hope to begin discussion on these
issues next week. For the present, this circular will cover the responses to
earlier mailings.

Continuing discussions
======================

D25.3 _pd_instr_radiation_probe to Core?
----------------------------------------
S> D25.3: Brian T's later explanation is more logical, and allows the item
S> _diffrn_radiation_type to perform its primary function at the moment
S> (as an Acta C input requirement). I therefore agree to the addition
S> of _diffrn_radiation_probe with the enumerations "x-ray neutron electron
S> gamma". Despite the sanctity of prior definitions, I really do not see 
S> an advantage in introducing new data item _diffrn_radiation_xray_symbol 
S> for a purpose that is already served by _diffrn_radiation_type. Remember
S> that this item is output by all major software packages, and the fewer
S> of these sorts of changes we make the better.

D25.6 _type_construct
---------------------
S> D25.6: We seem to have converged on this matter, and the overall
S> concepts are now imbedded in the core DDL.

D26.1 _diffrn_standards_decay_%
-------------------------------
S> D26.1: OK, David is probably right, I will try again.
S> 
S> data_diffrn_standards_decay_%
S>     _name                      '_diffrn_standards_decay_%'
S>     _category                    diffrn
S>     _type                        numb
S>     _enumeration_range           0.0:
S>     _definition
S> ;              The percentage decrease in the mean of the intensities
S>                for the set of standard reflections at the start of the
S>                measurement process and at the finish. This value is intended
S>                as a measure of the overall decay in crystal quality
S>                during the diffraction measurement process.
S> ;

D26.2 Suggested new definition of _diffrn_reflns_number
-------------------------------------------------------

G> The old definition is clearly not specific enough.  I share the opinion
G> of your co-editor that this number should NOT include systematic absences
G> (for the reasons he gives) and so the value written to the CIF file by
G> SHELXL does not include systematic absences.

D26.4 Concatenation in _atom_site_refinement_flags
--------------------------------------------------

G> Since nested loops are forbidden I see no alternative, nor do I see any
G> great practical difficulty in interpreting this item.  This feature is
G> used extensively in virtually every CIF file produced by SHELXL since
G> there is at present no other way of conveying some of this information.
G> 
G> There are now almost exactly 1000 registered users (usually one person
G> per research group) of SHELXL, and I have no plans to distribute a
G> new version in the near future (and probably most users wouldn't thank me
G> if I did).  I did make it very clear before I sent out the program how
G> difficult it would be to change existing definitions (or how these
G> definitions were interpreted).  It would always be possible for all CIF
G> reading programs to check '_audit_creation_method' in order to prevent
G> any misunderstandings, but this is hardly elegant.

This is a clear reminder of the need to preserve the published definitions.
It may well be that this condition allowing concatenation of flags can be
expressed in a machine-parsable form in the evolving dictionary; but it is
a property of existing data files that needs to be preserved.


New topics
==========

D28.1  Revision to _chemical_melting_point
------------------------------------------

S> OK, I want to close with one more proposed clarification of an existing
S> CIF item. The current definition of melting point is fine if you know
S> what the definition of melting point is!
S> 
S> data_chemical_melting_point
S>     _name                      '_chemical_melting_point'
S>     _category                    chemical
S>     _type                        numb
S>     _enumeration_range           0.0:
S>     loop_ _units_extension
S>           _units_description
S>           _units_conversion      ' '     'Kelvin'    +0
S>                                  '_C'    'Celsius'   +273.0
S>     _definition
S> ;              The melting point of the crystal.
S> ;
S> 
S> I propose we change the definition to:
S> 
S>     _definition
S> ;              The temperature that a crystalline solid changes to 
S>                a liquid.
S> ;

D28.2  Unweighted R factor on F^2^
----------------------------------

I append below an almost verbatim record of a correspondence between Brian
Toby and David Brown, because it seems a useful summary of the current
practice in the core dictionary for handling R factors for singl-crystal
X-ray work. Two things emerge at the end of this discussion: one is
Brian's revised definitions for residual factors appropriate to powder work
(especially in Rietveld refinements); I ask that we approve these
definitions, and shall suppose that silence on this matter can be taken as
agreement with the existing draft. The other point is whether an unweighted R
factor based on F^2^ (which is needed in powder work) should be defined in
the core, because it is also relevant to neutron single-crystal work. Like
David, I have an open mind on the subject. There is nothing to stop a neutron
single-crystal experimental description from pointing to the powder extension
dictionary, if that is necessary to extract some definition(s) relevant to
it; and the introduction of an unweighted R on F^2^ does introduce the
possibility of confusion among core definitions. Yet the core does seem to be
the best place (from the viewpoint of classification) for terms of
cross-disciplinary relevance.

So let me choose one of the alternatives, and call for a vote on it: a
definition for *unweighted* R factor calculated on F^2^ should be added
to the core dictionary. Responses to this quickly, please!

B>   I just got a call from Allen Larson who is furthering my
B> implementation for writing CIF files from GSAS. We realized that there
B> is no definition for R-factors based on F**2 in my current copy of the
B> core dictionary ... we will need the (unweighted) R-factor(F**2) for
B> powder work and probably both the weighted and unweighted for
B> neutron single-xtal (and other purists). I move that we add these
B> def's to the core.

D>         The core dictionary provides for the unweighted R factor based on 
D> F and a weighted R factor based on whatever form of structure factor is 
D> used in refinement.  For a Rietveld refinement the first of these might 
D> have some meaning; it is intended to provide a way of comparing different 
D> structure refinements performed in different ways.  Of course, it is 
D> necessary to have observed structure factors in order to calculate either 
D> of these quantities.  These are not always available in a Rietveld 
D> refinement.  The pd dictionary defines the R factors that are derived 
D> from a Rietveld refinement.
D> 
D> If I understand you correctly, you are asking for both an unweighted and 
D> a weighted R factor based on F**2.  The former could be added if it were 
D> to be useful, though if it were in core I suspect it would be misused by 
D> single crystal people who were refining on F**2 and did not realise that 
D> they should give the unweighted R based on F.  The latter is a little 
D> trickier because, unfortunately, _refine_ls_wR_factor_ only has a meaning 
D> when the refinement is against structure factors and this meaning changes 
D> depending on whether the refinement is performed against F, F**2 or I.  
D> In this case it would be safer to define this factor (when not used as 
D> the basis of refinement) in the pd dictionary.  If it is used as the 
D> basis of refinement, then there is no problem.  How essential is it to 
D> have these quantities when they are not used as the basis of refinement?

B> I regret that I did not look carefully at the definition for the weighted
B> R so that I did not realize that it can be on F or F**2 depending on the
B> refinement strategy. 
B>
B>  In any case, there is a strong need for the standard unweighted R-factor
B> on F**2 for powder work. In Hugo Rietveld's original code a mechanism for
B> estimating reflection intensities based on integrated areas and proportioning
B> overlapped reflections based on calculated intensities. It works pretty well
B> though it has no real statistical validity since the model
B> will affect the results. Be that as it may, Rietveld used these extracted
B> intensities to compute an R-factor on F**2 which he called the 
B> Bragg R-factor. Suffice it to say that none of the above has changed.
B> 
B>    The definition for the "Bragg R-factor" could be added to the powder
B> dictionary, but since it is a commonly used (and general) crystallographic
B> expression I would prefer to see it in the core.

D>         I suppose there is a good reason why the unweighted R factor 
D> based on F**2 is called the Bragg R factor, but it is not much used in 
D> single crystal work (if at all) and has no statistical significance in 
D> the sense that it represents the quantity minimised in the refinement.  
D> If the powder diffractionists need it, it should go in the pd 
D> dictionary.  What about unweighted R based on I?  Perhaps the names 
D> should reflect the systematics, something like *_R_F_, *_R_F**2_ and 
D> *_R_I_ rather than *_R_Bragg_ where one would have to look at the 
D> definition to find what it means.  *_wR_factor_ does now seem to have a 
D> specific meaning and *_R_factor_ is the same as *_R_F_ so we should not 
D> duplicate this definition.  

B>    I am not sure how widespread this practice is, but Allen Larson's
B> code (GSAS) computes R and wR on F and on F**2 for all single-xtal
B> refinements regardless of the quantity being fit. This may be a 
B> common practice for neutron single-crystal crystallography, but I am
B> not sure. I expect to hear back from Allen eventually.

[See also below.]

B>    Working on the assumption that no one else in the world reports an
B> unweighted R on F**2, I would prefer to see the name be _proc_ls_I_R_factor
B> (so that R_factor is not broken up).
B> 
B>    Below is my current list of changes for the Powder CIF.
B> ----------------------------------------------------------------------
B> Proposed changes to the circulating Powder CIF          BHT Aug-25-1994
B> 
B> 1)Remove the _pd_refln_index_ definitions, change references to _refln_index_
B> 
B> 2)Add:
B> data_proc_ls_I_R_factor
B>     _name                       '_refine_proc_ls_I_R_factor'
B>     _category                    refine
B>     _type                        numb
B>     _enumeration_range           0.0:
B>     _definition
B> ;              Residual factors for estimated reflection intensities,
B>                  R~I~ = (sum~hkl~ |I~obs~(hkl) - I~calc~(hkl)| / sum I~obs~(hkl)               where I~obs~(hkl) and I~calc~(hkl) are the squares of the
B>                observed and and calculated structure factors. This is often
B>                referred to as R~B~ or R~Bragg~ in Rietveld refinements.
B>                See also _pd_proc_ls_prof_ for profile R-factor definitions.
B> ;
B> 
B> 3)Revise:
B> data_pd_proc_ls_prof_
B>     loop_ _name                 '_pd_proc_ls_prof_R_factor'
B>                                 '_pd_proc_ls_prof_wR_factor'
B>                                 '_pd_proc_ls_prof_wR_expected'
B>     _category                    pd_proc_ls
B>     _type                        numb
B>     _definition
B> ;              Rietveld/Profile fit R-factors
B> 
B>                Note that the R-factor computed for Rietveld refinements
B>                using the extracted reflection intensity values (often
B>                called the Rietveld or Bragg R-factor) is not properly a 
B>                profile R-factor. This R-factor may be specified using 
B>                _proc_ls_I_R_factor.
B> 
B>               _pd_proc_ls_prof_R_factor, often called R~p~, is an 
B>                 unweighted fitness metric for the agreement between the 
B>                 observed and computed diffraction patterns
B>                    R~p~ = sum~i~ ( I~obs~(i) - I~calc~(i) ) 
B>                           / sum~i~ ( I~obs~(i) )
B> 
B>               _pd_proc_ls_prof_wR_factor, often called R~wp~, is a
B>                 weighted fitness metric for the agreement between the 
B>                 observed and computed diffraction patterns
B>                   R~wp~ = SQRT {
B>                            sum~i~ ( w(i) * [ I~obs~(i) - I~calc~(i) ] ^2^ )
B>                            / sum~i~ ( w(i) * [I~obs~(i)]^2^ ) }
B> 
B>               _pd_proc_ls_prof_wR_expected, sometimes called the 
B>                 theoretical R~wp~ or R~e~, is a weighted fitness metric for the                 statistical precision of the dataset. For an idealized fit, 
B>                 where all deviations between the observed intensities and 
B>                 those computed from the model are due to statistical 
B>                 fluctuations, the observed R~wp~ should match the expected 
B>                 R-factor. In reality R~wp~ will always be higher than 
B>                 R~e~.
B>                   R~e~ = SQRT { 
B>                                  (n - p)  / sum~i~ ( w(i) * [I~obs~(i)]^2^ ) }
B> 
B>                 Note that in the above equations, 
B>                    w(i) is the weight for the ith data point (see
B>                         _pd_proc_ls_weight)
B>                    I~obs~(i) is the observed intensity for the ith data
B>                         point, sometimes referred to as y~i~(obs) or
B>                         or y~oi~. (See _pd_meas_count_total, 
B>                         _pd_meas_intensity_total or _pd_proc_total).
B>                    I~calc~(i) is the computed intensity for the ith data
B>                         point with background and other corrections
B>                         applied to match the scale of the observed dataset, 
B>                         sometimes referred to as y~i~(calc) or
B>                         or y~ci~. (See _pd_calc_intensity_total).
B>                    n is the total number of data points (see
B>                         _pd_proc_number_of_points) less the number of
B>                         data points excluded from the refinement.
B>                    p is the total number of refined parameters.
B> ;


D>         Your proposed changes to the powder dictionary look fine to me.  
D> I only note that I is not the same as F**2.  You seem to have been 
D> concerned to define and R factor for F**2, but you have ended by defining 
D> one for I instead.  I wonder whether *_ls_R_factor_I might not be better 
D> than *_ls_I_R_factor?  I do not feel too strongly either way.

B> I had a conversation with Allen Larson yesterday eve. He feels strongly that
B> R(F**2) is widely used in neutron single-crystal refinements and should
B> thus be in the core. Would you like to do a poll?
B> 
B> I will bow out from the deliberations at this point, as you have pointed
B> out we are using R(I) which is not the same as R(F**2). There have been 
B> no calls as far as I know for R(I) in single xtal work.
B> 
B> I choose _pd_ls_I_R_factor to match the existing definitions 
B> _ps_ls_prof_*_factor (having the R_factor and wR_factor at the end allows
B> the definitions to be combined). The choice is open for discussion.

D>         I have not particular objection to adding unweighted R(F**2).  We 
D> should try it out on the rest of comcifs.


D28.3  Does SHELXL misinterpret wR(obs)?
----------------------------------------

A Coeditor commented that Acta should print the weighted R factor using all
reflections when a SHELXL refinement is done, and not just the weighted R
factor for observed reflections, which is the traditional Commission on
Journals requirement. It is true that SHELXL does calculate a wR factor
using *all* reflections, and another using all, except for some few which
might have been deliberately omitted because of systematic error. It seems
that the program outputs these two values as  _refine_ls_wR_factor_all and
_refine_ls_wR_factor_obs. For example, when we look at the example file in
the SHELXL manual, we see the entries
  _refine_ls_wR_factor_all   0.0598
  _refine_ls_wR_factor_obs   0.0547
and an explanation in _refine_special_details that says (in part)
"Refinement on F^2^ for ALL reflections except for 3 with ... systematic
errors. The observed criterion ... is used only for calculating
_R_factor_obs etc. and is not relevant to the choice of reflections for 
refinement."

This implies that the _wR_all is calculated on all 3704 independent
reflections (_reflns_number_total) and the _wR_obs on the 3701
(_refine_ls_number_reflns) that remain after discarding the 3 bad
reflections. This impression is strengthened by the discussion on page 22 of
the printed manual ('Why Does SHELXL-93 Refine Against F-Squared?'): "In
the final refinement ALL DATA should be used except for reflections known
to suffer from systematic error".

However, the CIF dictionary definition of _refine_ls_wR_factor_obs says
"Residual factors ... for reflection data classed as 'observed' (see
_reflns_observed_criterion)", so this would seem to be at variance with what
SHELXL has done, if I read the runes correctly. This has been noticed
before, by Mario Nardelli, who has suggested a new dictionary term for
refinement using all the independent reflections minus those known to be in
systematic error.

(a) Is my reading of this correct? The value that SHELXL outputs as
_refine_ls_wR_factor_obs should be _something_else?

(b) If so, suggestions are welcomed for an appropriate name and definition
for that '_something_else'.

============

Best wishes
Brian
Prev by Date: (27) _type_construct, enumerations, category overviews...
Next by Date: (29) Symmetry, R factors again, SIF, final DDL version 1.4
Index(es):
- Date
Discussion List Archives

(28) Definitions and redefinitions; unweighted R(F^2^)