Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: re. Maximum Likelihood

Dear David,

I was too optimist in my hopes to get some free time after lunch to talk 
about ML.

I would start form questions :

1) why at all ML ? why we are not happy to fit Fcalc(model) to Fobs?
2) which LIKELIHOOD is discussed? likelihood of WHICH statistical hypothesis?

a) If the experimental data are perfect and the model COMPOSITION (I do not 
tell - the VALUES of parameters but the SET of parameters) is good, for the 
final ideal model Fcalc = Fobs, an we need to fit Fcalc to Fobs.

b) macromolecular models are never complete; there are parts missed in the 
macromolecule itself, there are missed crystallographic water, there is 
bulk solvent. Therefore it is WRONG to request to fit Fcalc to Fobs; for 
the ideal position of atoms of the available model (which is always partial 
!) Fcalc is DIFFERENT from Fobs because of the (unknown) contribution of 
missed atoms.

c) how can we estimate this contribution ? We can suppose that (which is 
far from be the best hypothesis) all unknown atoms are distributed 
uniformly in the unit cell, which gives are values of the corresponding 
structure factors as a function of RANDOM variables.

d) now structure factors from ANY partial model (= statistical model !) can 
be completed by such RANDOM function, and the goal is to choose such 
partial model contribution from which corrected by random function gives 
experimental data with the HIGHEST Probability (= maximal likelihood).


Therefore, after all formulae are written, we have this ML function. Which 
is its main difference from LS ? To understand this, one can construc a 
quadratic approximation of the ML function near its point of minimum. This 
shows NEW target values and corresponding WEIGHTS.

Breafly, ML function suggests :
a) to use E vaolues and not F values
b) replace weak Fobs by 0 (contribution of missed atoms can be stronger !)
c) take 0 weights for reflections with Fobs close to the mean contribution 
from missed atoms.


For sure, the estimation of modified Fobs and corresponding weights depends 
strongly on the choice of statistical parameters, as Randy noted. However, 
as you can find from our CCP4-paper, it seems that the choice of parameters 
from the CURRENT model is not the best one. One can try to estimate such 
parameters from UNKNOWN answer but for which the error in atomic position 
should be 0. Therefore, that Randy's argument on Rfree is not absolutely 
applicable in this case.


For small molecules :

a) data are practically perfect
b) models are complete (what Eleanor probably ment in her phrase about R<10%)
c) hypothesis about uniforme distribution of missed atoms (if they are) is 
not hold at all.

With all consequences. However, NOBODY checked it yet. Who knows...

I attach here a coupe of articles. While the second one WAS published, this 
is only a preliminary publication in CCP4, and I did not want to discuss 
all this largely through our SIG-net. We should write all this soon and 
send to Acta Cryst A.

I hope all this gives you some another point of view with less "mistics" 
about ML (there is quite a lot in the current literature).





Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.