Discussion List Archives

[Date Prev][Date Next][Date Index]

(31) su; R factors; hydrogen bonds; matrix types

  • To: COMCIFS@iucr.ac.uk
  • Subject: (31) su; R factors; hydrogen bonds; matrix types
  • From: bm
  • Date: Fri, 24 Feb 1995 15:03:12 GMT
Dear Colleagues

Obviously, the material in the last mailing will take some time to assimilate
and assess objectively, but I should like to keep our discussions ticking
over, and so I'm distributing what I received in the last week. As usual,
'D>' is David Brown, 'JW>' John Westbrook.


D16.1  esd v su
---------------
D> It looks as if we should adopt s.u. from here on.  Clearly we
D> cannot change the data names that already exist, but we should use
D> _su in the future including in all the dictionaries now under
D> consideration.  Perhaps down the road when the idea of alias names
D> is well established we can standardize on _su throughout.
D> 
D>      It is no concern of CIF whether someone wants to use the rule
D> of 19 or not.  In principle one could write a number as 
D> 
D>                1.23456(123)
D> 
D> without violating any cif procedure.  Our concern is only to
D> standardize the formats and the definitions of the content, but not
D> how that content is expressed.  Already cif provides a fairly rigid
D> straightjacket and there is no need to introduce any constraints
D> that are not essential to the structure of the file.  Acta Cryst.
D> can truncate any of these numbers in any way they please.

D28.2 and D28.3 R Factors
--------------------------
D> The use of the suffixes _all and _obs is quite unnecessary for any
D> R factor that includes weights and the weights (or how to calculate
D> them) should always be included in the cif.  If the authors have
D> used an F > 3*sigma rule, then the weights are set to zero for all
D> the weak reflections and *_wR_all should be the same as *_wR_obs. 
D> The less we use _all and _obs the better.
D> 
D> _refine_ls_R_factor is a traditional R factor that we should not
D> discourage too heavily and here it is appropriate to add _all and
D> _obs.
D> 
D> _refine_ls_wR_factor should certainly not have the suffices.  This
D> statement is independent of software.  'Unobserved' reflections may
D> have weights of zero, so may other unreliable measurements.  Other
D> reflections may have low weights, others higher weights. 
D> Arbitrarily omitting some non-zero weighted reflections for the
D> purpose of calculating an R factor makes no sense.  The only suffix
D> that makes any sense is _all, and therefore it is not necessary
D> since it is assumed.  
D> 
D> _refine_ls_number_reflns is a little more problematic.  I would
D> recommend that it include all the reflections that were included in
D> the refinement with non-zero weight.  The alternative is for it to
D> refer to all the reflections that were measured including those
D> assigned zero weight in the refinement, but this does not agree
D> with the name (_refine_*) since the zero weighted reflections were
D> not used in the refinement.  There is no virtue in distinguishing
D> between different classes of zero weighted reflections.
D> 
D> _refine_ls_restrained_S is a weighted function and the same
D> argument applies.  No _all or _obs.
D> 
D> _refine_proc_ls_I_R_factor could in principle have the suffices
D> _all and _obs, but I would not encourage it.  If the powder people
D> can manage without, lets not introduce them to bad habits.
D> 
D> The other R factors from the powder dictionary are all well defined
D> and do not need _all of _obs suffixes.
D> 
D> Howard's comments on the function minimised in the refinement are
D> well taken.  _wR_ should be the R factor that most closely
D> approaches the function minimised.  That is, it should be based on
D> F, F**2 or I according to the quanitity that is used in the
D> refinement and the w of _wR_ should refer to the weights actually
D> used in the refinement.

25.6 Type_constructs
---------------------
D>      I agree that we leave this for now, but let's make sure that
D> we do not do anything that will prejudice the implementation later.

D30.1 Hydrogen bonds
--------------------
D>      Obviously it is desirable to have this feature in the
D> dictionary, but I have problems with all three solutions you
D> propose.
D> 
D>      The problem with scheme 1 is that the long bond (H-A) is
D> designated as a contact which is exactly what some people think it
D> is, but I would insist that it is a bond.  The question is 'when is
D> a bond not a bond?'  As someone who has given some thought to the
D> topic I know that a discussion on this topic will last through
D> several drinks, and people will stop making sense long before they
D> come to an agreement.  It was, maybe, a mistake to distinguish
D> between bonds and contacts in the original core dictionary, since
D> there is no good definition of a bond.  We should have contented
D> ourselves with 'distance'.  (I can only accept full blame for the
D> current version of the core.  Frank and I did head off some
D> distasters but obviously not all.)
D> 
D>      The problem with scheme 2 is that it raises hydrogen bonds to
D> something special, which in some sense they are, but sets a
D> precedent for having a different syntax for different kinds of
D> bonds.  The scheme is, however, pleasantly neutral on the topic of
D> bonds and contacts.
D> 
D>      Scheme 3 is a compromise.  It treats hydrogen bonds like other
D> bonds but allows construction of a special table from the bonds
D> listed elsewhere.  Of the three, it is the most attractive, but I
D> would definitely promote H(C6) O(2) 2.34(4) in the example into the
D> bond category.  It is after all, this weak interaction that
D> constitutes the business end of the hydrogen BOND, and so should
D> always be included in the list of bonds.  This scheme does not
D> undermine the integrity of the bond and contact loops, but it does
D> allow the hydrogen bond information to be extracted and placed in
D> a table.  Presumable the publication flag for these bonds, contacts
D> and angles would be 'no' since they are not to be published in the
D> main table of bond lengths etc.

D30.2 The new DDL
-----------------
D>      In the interests of speed, I suggest that we stay with DDL1.4
D> for the core.  There is a growing number of people who are now
D> familiar with the core dictionary and we should prepare them for
D> the change to DDL2.  I suspect that the biocrystallography
D> community who have been pressing for DDL2 will be able to
D> accommodate more readily to it.  This will give comcifs time to
D> become comfortable with it, which will make it less likely that the
D> core dictionary will contain solecisms when it eventually does
D> appear in DDL2.
D> 
D>      From a first reading of your description of the differences
D> between DDL1 and DDL2 it would seem that the correct way to use a
D> dictionary in DDL2 is by means of a browser that can locate the
D> desired information and present it on request.  The structure will
D> be more logical in one sense than the alphabetical ordering system
D> we have at the moment, since one has to know the name of an item in
D> order to find it in the dictionary - it is difficult to find the
D> name attached to a particular concept, which is what one usually
D> wants to do.  However, if the dictionary is issued with its own
D> browser, the internal structure does not matter.


D30.5 Matrix/vector structure types
-----------------------------------
I have mentioned before my difficulties with the proposed extension to
including structured data types, representing matrices, in CIFs. My quarrel
is not with the desirability of having such a thing, but the way in which it
might be implemented within the existing STAR syntax rules and CIF
conventions. These are technical issues which need not be debated in this
forum, but I think John's explanations of the underlying motivation and the
way to implement the concept are useful for our long-term considerations.

JW> Aside from the notational convenience of having a matrix object,  one of the
JW> major reasons for adopting such a structure was to support the use of matrix
JW> objects by embedded methods.  Although the current dictionary does not
JW> not include any such methods, I suspect that these will be added in time
JW> by the archive that maintains the macromolecular data.  It is very
JW> clumsy to use the _11, _12, etc nomenclature when referring to a matrix
JW> within a method.
JW> 
JW> One thing that Paula has been very careful to do is to provide aliases for
JW> all existing matrix definitions which are consistent with the current
JW> definitions.  For these items, there will be a definition of a matrix
JW> along with each of the matrix elements.  In the new dictionary, these
JW> are identified as using square brackets.  Each element is aliased to
JW> its corresonding  CIF definition using the underscore notation.  To avoid
JW> confusion, definitions of this type are designated "alternate exclusive"
JW> which means that one can refer to matrices using only one type of
JW> abstraction (never both).   Hence, it is always possible to construct
JW> a data file with the mmCIF dictionary that is backward compatible with CIF.

Best wishes
Brian