E1303

BAYESIAN AB INITIO PHASING: THE ROLE OF STRUCTURE FACTOR STATISTICS WITH BUILT-IN STEREOCHEMISTRY. Gérard._Bricogne, Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England; and LURE, Bâtiment 209D, 91405 Orsay, France.

So far the main efforts in formulating and implementing the Bayesian approach to structure determination have been directed towards the design of (1) a more powerful method (the saddlepoint approximation) for evaluating joint probability distributions of structure factors, capable of handling non-uniform distributions of random atomic positions (using the maximum-entropy method); and of (2) a systematic protocol for forming hypotheses (typically, but not exclusively, trial phase assignments), for sampling them efficiently (e.g. by "magic lattices" based on error-correcting codes) and for testing them against the available data (by examining the log-likelihood gain statistic).

In spite of these elaborations, the initial assumptions on which the Bayesian statistical machinery is set to work remain the same as that of standard direct methods: all atoms are assumed to be statistically independent, so that chemical bonding rules are ignored. Overcoming this embarrassing inadequacy, i.e. finding a way of incorporating a priori stereochemical knowledge into structure factor statistics, has proved one of the most elusive questions in theoretical crystallography.

It will be shown here that the key concepts of saddlepoint approximation and maximum-entropy distributions can be applied to this problem to yield joint probability distributions of structure factors with built-in stereochemistry, i.e. a priori statistical criteria of stereochemical validity [1]. This procedure can use a hierarchically organised knowledge base incorporating the known clusterings of short hexapeptide building blocks, of secondary and super-secondary motifs, and of domain folds. Sequential Bayesian inference can be conducted on the basis of these more stringent criteria in such a way as to consult all relevant structural information to compensate for the relative paucity of diffraction data which distinguishes the macromolecular from the "small moiety" situation. This procedure provides the natural foundation on which to build a genuine expert system for knowledge-based structure determination.

Reference

[1] BRICOGNE, G. (1995). In ECCC Computational Chemistry (Bernardi & Rivail, editors). Amer. Inst. of Phys. Conf. Proceedings, 330, 449-470.