# Re: re. Maximum Likelihood

```Dear David,

I was too optimist in my hopes to get some free time after lunch to talk

I would start form questions :

1) why at all ML ? why we are not happy to fit Fcalc(model) to Fobs?
2) which LIKELIHOOD is discussed? likelihood of WHICH statistical hypothesis?

a) If the experimental data are perfect and the model COMPOSITION (I do not
tell - the VALUES of parameters but the SET of parameters) is good, for the
final ideal model Fcalc = Fobs, an we need to fit Fcalc to Fobs.

b) macromolecular models are never complete; there are parts missed in the
macromolecule itself, there are missed crystallographic water, there is
bulk solvent. Therefore it is WRONG to request to fit Fcalc to Fobs; for
the ideal position of atoms of the available model (which is always partial
!) Fcalc is DIFFERENT from Fobs because of the (unknown) contribution of
missed atoms.

c) how can we estimate this contribution ? We can suppose that (which is
far from be the best hypothesis) all unknown atoms are distributed
uniformly in the unit cell, which gives are values of the corresponding
structure factors as a function of RANDOM variables.

d) now structure factors from ANY partial model (= statistical model !) can
be completed by such RANDOM function, and the goal is to choose such
partial model contribution from which corrected by random function gives
experimental data with the HIGHEST Probability (= maximal likelihood).

-------------------------

Therefore, after all formulae are written, we have this ML function. Which
is its main difference from LS ? To understand this, one can construc a
quadratic approximation of the ML function near its point of minimum. This
shows NEW target values and corresponding WEIGHTS.

Breafly, ML function suggests :
a) to use E vaolues and not F values
b) replace weak Fobs by 0 (contribution of missed atoms can be stronger !)
c) take 0 weights for reflections with Fobs close to the mean contribution
from missed atoms.

------------------------

For sure, the estimation of modified Fobs and corresponding weights depends
strongly on the choice of statistical parameters, as Randy noted. However,
as you can find from our CCP4-paper, it seems that the choice of parameters
from the CURRENT model is not the best one. One can try to estimate such
parameters from UNKNOWN answer but for which the error in atomic position
should be 0. Therefore, that Randy's argument on Rfree is not absolutely
applicable in this case.

--------------------------------------

For small molecules :

a) data are practically perfect
b) models are complete (what Eleanor probably ment in her phrase about R<10%)
c) hypothesis about uniforme distribution of missed atoms (if they are) is
not hold at all.

With all consequences. However, NOBODY checked it yet. Who knows...

I attach here a coupe of articles. While the second one WAS published, this
is only a preliminary publication in CCP4, and I did not want to discuss
all this largely through our SIG-net. We should write all this soon and
send to Acta Cryst A.

I hope all this gives you some another point of view with less "mistics"
about ML (there is quite a lot in the current literature).

Regards,

Sasha```

lc0045.pdf

CCP4_40_14.pdf

Reply to: [list | sender only]