research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047

Modelling prior distributions of atoms for macromolecular refinement and completion

aMRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England,bGlobal Phasing Ltd, Sheraton House, Castle Park, Cambridge CB3 0AX, England, and cLURE, Université Paris-Sud, Bâtiment 209D, 91405 Orsay, France
*Correspondence e-mail: gb10@mrc-lmb.cam.ac.uk

(Received 3 May 2000; accepted 14 June 2000)

Until modelling is complete, macromolecular structures are refined in the absence of a model for some of the atoms in the crystal. Techniques for defining positional probability distributions of atoms, and using them to model the missing part of a macromolecular crystal structure and the bulk solvent, are described. The starting information may consist of either a tentative structural model for the missing atoms or an electron-density map. During structure completion and refinement, the use of probability distributions enables the retention of low-resolution phase information while avoiding premature commitment to uncertain higher resolution features. Homographic exponential modelling is proposed as a flexible, compact and robust parametrization that proves to be superior to a traditional Fourier expansion in approximating a model protein envelope. The homographic ­exponential model also has potential applications to ab initio phasing of Fourier amplitudes associated with macro­molecular envelopes.

1. The case for low-resolution distributions in partial structure refinement and completion

Crystallographic partial structure refinement and completion is usually performed by omitting the questionable parts of the structure and refraining as much as possible from building in ill-defined density regions. If the starting phases are of poor quality, the process of phase improvement by model building is therefore slow, because some of the low-resolution positional information that is already available is not incorporated until the position of the missing atoms is unambiguously defined. In order to avoid locking in on an incorrect structure, even the most likely clues or inspired guesses about the position of the missing atoms are set aside, surrendering to the fear of model bias.

One way of overcoming these difficulties is the iterative placement of atoms in the peaks of the uninterpretable regions of the electron-density map, leading to a `hybrid model' for the crystal structure that comprises the protein model and free atoms (Perrakis et al., 1999[Perrakis, A., Morris, R. & Lamzin, V. (1999). Nature Struct. Biol. 6(2), 458-463.]). A different strategy is described here, as implemented in the computer program BUSTER (Bricogne, 1993[Bricogne, G. (1993). Acta Cryst. D49, 37-60.], 1997[Bricogne, G. (1997). Methods Enzymol. 276, 361-423.]), which uses a Bayesian statistical model to merge consistently various sources of crystallographic phase information. At any stage during the phasing process, low-resolution real-space distributions are used in BUSTER to provide a statistical description of the scattering from the parts of structures that cannot be modelled reliably, either because they are weakly scattering (missing or disordered residues) or because of their intrinsic disorder (bulk solvent).

The main advantages of this procedure are: (i) the scaling of the data to the model is robust and accurate; (ii) the danger of biasing the refinement towards the initial values given to the parameters of the already traced atoms is less serious, because the scattering from the missing atoms is accounted for in a statistical sense; and (iii) from the low-resolution distribution for the missing atoms a maximum-entropy distribution can be derived; suitably scaled and thermally smeared, this is a versatile alternative to conventional weighted difference Fourier maps.

Before we examine closely how the real-space distributions are computed (§[link]4), we add a brief section defining the symbols used throughout (§[link]2) and a section containing the general outline of the structural model as implemented in BUSTER[link]3).

2. Symbols used in this paper

In this paper, five types of real-space distributions are dealt with, all of which are handled in BUSTER as CCP4-format maps sampled on a crystallographic grid with NX, NY and NZ points along the crystallographic axes. We list here the symbols for these distributions (omitting any subscripts), as an aid to the reader.

  • q(x), a generic distribution in the crystallographic unit cell.

  • χ(x), an indicator function, i.e. a binary mask whose values are 0 or 1 only; Vχ is the fractional volume of the mask χ(x); when the latter is sampled on a crystallographic grid NX NY NZ,

    [V_{\chi} = (1/ {\rm NX \, NY \, NZ}) \textstyle \sum \limits_{i = 1}^{NX}\sum \limits_{j = 1}^{NY}\sum \limits_{k = 1}^{NZ} \chi(i,j,k). \eqno (1)]

  • m(x), an envelope, i.e. a positive everywhere and continuous function, usually with low-resolution Fourier components only; m(x) is normalized so that its average in the unit cell is unity,

    [({{1}/{V}})\textstyle \int \limits_{V} m({\bf x})d^{3}{\bf x} = 1, \eqno (2)]

    V being the volume of the unit cell; when sampling m(x) on a grid,

    [(1/{\rm NX\,NY\,NZ})\textstyle \sum \limits_{i = 1}^{NX}\sum \limits _{j = 1}^{NY}\textstyle \sum \limits_{k = 1}^{NZ} m(i,j,k) = 1. \eqno (3)]

  • p(x), a probability distribution, so that 0 ≤ p(x) ≤ 1; [\textstyle \int_{V} p({\bf x})d^{3}{\bf x} = 1].

  • ρ(x), an electron density, in e Å−3 units.

Vertical bars denote the absolute value, |f(x)| = abs[f(x)]; angled brackets denote expectation value under a probability density, 〈f(x)〉 = [\textstyle \int]P(x)f(x)dx; the asterisk stands for convolution, (f * g)(x) = [\textstyle \int]f(xy)g(y)dy.

3. The structural model

The electron density at point x in the unit cell is written as the sum of three contributions,

[\rho_{\rm tot}({\bf x}) = \rho_{\rm frag}({\bf x})+ \rho_{ \rm rand}({\bf x})+ \rho_{\rm solv}({\bf x}), \eqno (4)]

where ρfrag(x) is the electron density for the known fragment of the structure for which the atomic positions are known with a good degree of confidence; ρrand(x) is the density for the atoms that are missing in the fragment and whose positions are described using a probability distribution and a random atom model (see §[link]3.2); ρsolv(x) is the bulk solvent density. Here, ρtot(x) is on an absolute scale.

The model for the structure factor is clearly

[F_{\rm tot}({\bf h}) = {\cal F} [{\rho_{\rm tot}({\bf x})}]({\bf h}) = F_{\rm frag}({\bf h})+ F_{\rm rand}({\bf h})+ F_{\rm solv}({\bf h}), \eqno (5)]

where the subscripts retain the meaning they have in (4[link]).

Before we describe how the real-space distributions are computed, the next three sections will say some more about the individual components of the structural model.

3.1. The partial structure model

The atoms whose positions are known with a good degree of confidence are described by a set of conventional atomic model parameters. Their positions, isotropic displacement parameters (i.e. temperature factors) and occupancies can be refined by maximum likelihood, using an interface to the refinement package TNT (Tronrud et al., 1987[Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489-501.]; Tronrud, 1997[Tronrud, D. E. (1997). Methods Enzymol. 277, 306-319.]), as previously described (Bricogne & Irwin, 1996[Bricogne, G. & Irwin, J. J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85-92. Warrington: Daresbury Laboratory.]). The standard stereochemical, geometrical and non-crystallographic symmetry (hard and soft) restraints are handled in TNT. During partial structure refinement the probability distribution for the random atoms, as well as the bulk-solvent ­distribution, are kept fixed.

3.2. The missing structure model

The prior expectation about the position of the missing atoms is cast in quantitative terms using an envelope mrand(x) that is used as a positional prior distribution for the same atoms; the calculation of mrand(x) is described in §[link]4. As the suffix `rand' suggests, all the missing atoms are assumed to be randomly distributed according to mrand(x).

Once the partial structure has been refined, a maximum-entropy distribution qrand(x) for the missing atoms is computed in the form

[q_{\rm rand}({\bf x}) = {{1}\over{Z}}m_{\rm rand}({\bf x})\exp\left[\textstyle \sum \limits_{\bf h} \lambda_{\bf h}\Xi_{\bf h}({\bf x})\right], \eqno (6)]

where Z is a normalization factor such that [\textstyle \int_{V}]qrand(x)d3x = 1, λh are Lagrange multipliers and Ξh is the trigonometric structure factor, i.e. the structure factor for a point scatterer at rest,

[\Xi_{\bf h}({\bf x}) = {{1}\over{|G|}} \textstyle \sum \limits_{g\in G}\exp[2\pi i{\bf h} S_{g}{\bf x}]. \eqno (7)]

|G| is the number of elements of the space group G and Sgx = Rgx + tg is the generic symmetry operation in G.

The calculation of qrand(x) is performed varying the λh under the constraint of maximum entropy, as outlined in Roversi et al. (2000[Roversi, P., Irwin, J. & Bricogne, G. (2000). In Electron, Spin and Momentum Densities and Chemical Reactivities, edited by P. G. Mezey & B. E. Robertson. Dordrecht: Kluwer. In the press.]).

qrand(x) can be normalized and turned into a positional posterior probability distribution. It shows the extent to which the prior expectation mrand(x) is confirmed or contradicted by the observations. In the absence of noise and if the observations contained no information regarding the region of interest, the final probability distribution would coincide with the (normalized) prior (1/Z)mrand(x) (because λh = 0 ∀ h). In practice, both noise and signal in the data will cause the λh to differ from zero and build features into qrand(x). The structure-factor contribution to the structure factor from the missing atoms is computed from qrand(x) using the sum of the scattering factors for the same atoms,

[{\bf F}_{\rm rand}({\bf h}) = \Sigma_{\rm rand}(\bf h) \times {\cal F}[{q_{\rm rand}({\bf x)}}]({\bf h}), \eqno (8)]

where Σrand(h) is the sum of the scattering factors for the missing atoms,

[\Sigma_{\rm rand}({\bf h}) = {\textstyle \sum \limits^{N_{\rm rand}}_{j}}f_{j}({\bf h}) \exp\left [-\langle B\rangle_{j}{{d^{*2}_{\bf h}}\over{4}}\right ]. \eqno (9)]

3.3. The bulk-solvent model

The bulk-solvent structure factor Fsolv(h) on the absolute scale can be computed from the Fourier components of the bulk-solvent density ρsolv(h), smeared by the solvent temperature factor,

[{\bf F}_{\rm solv}({\bf h}) = {\cal F}[{\rho_{\rm solv}({\bf x})}]({\bf h}) \times\exp\left[-B_{\rm solv}{{d_{\bf h}^{*2}}\over{4}}\right]. \eqno (10)]

The bulk-solvent density is taken proportional to the bulk-solvent envelope msolv(x),

[\rho_{\rm solv}({\bf x}) = \overline{\rho}_{\rm solv} \times m_{\rm solv}(\bf x), \eqno (11)]

where [\overline{\rho}_{\rm solv}] and Vsolv are the electron density and volume of the bulk solvent.

In BUSTER, the bulk-solvent envelope msolv (x) is never handled as such, the macromolecular envelope mmacrom(x) being used instead; mmacrom(x) is either computed from the whole molecule atomic model [see §[link]4.2, the volume Vmacrom(x) being the volume of the whole binary mask χmacrom(x)] or it is computed starting from the density using the known solvent-volume fraction (see §[link]4.3).

Once mmacrom(x) is obtained, the Babinet principle,1 relating the low-resolution Fourier components of two complementary distributions msolv(x) and mmacrom(x), is used,

[V_{\rm solv}{\cal F}[m_{\rm solv}({\bf x})]({\bf h}) = -V_{\rm macrom}{\cal F}[m_{\rm macrom}({\bf x})]({\bf h}), \eqno (12)]

so that

[\eqalignno {{\bf F}_{\rm solv}({\bf h})& = -\overline{\rho}_{\rm solv}V_{\rm macrom}\times {\cal F}[m_{\rm macrom}({\bf x})] ({\bf h}) \cr &\ \quad \times \ \exp \left[{{\left(-d^{\ast}_{\bf h}\right)^{2}}\over {4}}B_{\rm solv}\right]. & (13)}]

4. Computing mrand(x)

We can now examine more closely how the real-space envelopes are computed; in particular, we discuss here the calculation of the envelope for the missing atoms, mrand(x). Similar techniques can be used to compute the envelopes for the whole macromolecule or for the bulk solvent.

As soon as an initial model is available, the prior distribution [m_{\rm rand}(\bf x)] for the positions of the missing atoms can be computed in three ways: (i) by excluding the missing atoms from the regions already containing the partial structure (uniform prior, §[link]4.1), (ii) by using a trial atomic model for the missing atoms (model-based non-uniform prior, §[link]4.2) or (iii) simply from the local fluctuation of the electron density (map-based non-uniform prior, §[link]4.3).

4.1. Uniform prior

The simplest choice for the missing atoms prior probability distribution is to exclude them from the regions that already contain a reliable atomic model: this brings into the statistical model the notion that a number of atoms are missing and that they are equally likely to be anywhere except where other atoms have been placed already.

The uniform prior distribution is defined in three steps as follows.

  • (i) A binary mask [\chi_{\rm frag}^{\rm a.u.}(\bf x)] is drawn around the known partial structure; this step is performed using the program NCSMASK (Collaborative Computational Project, Number 4, 1994[Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760-763.]). The masking radius Rfrag can be varied; the default for Rfrag is 2.05 Å.

  • (ii) [\chi_{\rm frag}^{\rm a.u.}(\bf x)] is symmetry expanded to cover the whole cell; this symmetry-expanded binary mask χfrag(x) is negated to obtain a binary mask χrand(x) for the random atoms,

    [\chi_{\rm rand}({\bf x}) = 1-\chi_{\rm frag}({\bf x}). \eqno (14)]

  • (iii) χfrag(x) is blurred by means of a convolution with an isotropic Gaussian G(x; Brand) and normalized,

    [m_{\rm rand}({\bf x}) = {{1}\over{V_{\chi_{\rm rand}}}}\times [\chi_{\rm rand} \ast G(B_{\rm rand})] (\bf x), \eqno (15)]

    where the parameter Brand controls the width of the Gaussian and therefore the slope of mrand(x) around the model used in generating [\chi_{\rm frag}^{\rm a.u.}(\bf x)].

The convolution in (15[link]) is effected in reciprocal space, using a set of periodized (`aliased') structure factors for mrand(x). The use of aliased structure factors to sample thermally smeared model densities on arbitrarily coarse crystallographic grids has been described in the Appendix of Roversi et al. (1998[Roversi, P., Irwin, J. & Bricogne, G. (1998). Acta Cryst. A54, 971-996.]) and will not be detailed here.2

We stress that this distribution is uniform outside the regions occupied by the model, hence the name `uniform prior', but its shape is not uniform; only in absence of any partial model is this a truly uniform distribution throughout the unit cell.

We also notice that if the bulk-solvent envelope is also chosen to fill up all the space left empty by the macromolecular model, the missing atoms envelope and the bulk-solvent envelope are overlapping. They can still differ for the parameter B used in the blurring step (15[link]).

4.2. Model-based non-uniform prior

Sometimes a rough guess is available as to the placement of a subset of atoms, such as a protein loop or domain or a bound ligand, but the model tentatively built for the same atoms is questionable. An envelope mrand(x) can then be built around these ill-defined atoms and the same atoms omitted from the partial structure. The real-space picture of the crystal in this case then comprises the bulk-solvent envelope, the atomic model for the trusted traced atoms and the missing atoms envelope. The latter is localized around the tentatively placed atoms; it represents our prior expectation about their position but does not retain any of the high-resolution details that are being assessed.

The prior distribution is computed in four steps as follows.

  • (i) A binary mask [\chi_{\rm macrom}^{\rm a.u.}(\bf x)] is drawn around the complete atomic model, including the parts that will be omitted; the radius for this masking can vary between 2 and 4 Å, depending on the degree of confidence one wants to retain regarding the omitted model (a tighter radius resulting in a distribution highly localized around the omitted atoms).

  • (ii) A binary mask [\chi_{\rm frag}^{\rm a.u.}(\bf x)] is drawn around the part of structure that is going to be retained and a binary mask for the random atoms [\chi_{\rm rand}^{\rm a.u.}(\bf x)] is obtained from

    [\chi_{\rm rand}^{\rm a.u.}({\bf x}) = \chi_{\rm macrom}^{\rm a.u.}({\bf x}) \times \left [1 - \chi_{\rm frag}^{\rm a.u.}({\bf x}) \right]. \eqno (16)]

  • (iii) The [\chi_{\rm rand}^{\rm a.u.}(\bf x)] mask is symmetry expanded to the unit cell to give χrand(x).

  • (iv) χrand(x) is blurred by means of a convolution with an isotropic Gaussian G(x; Brand) and normalized as in (15[link]).

4.3. Map-based non-uniform prior

Even when no atomic model is available, some rough idea about the placement of the missing atoms can be retrieved from the presence of high values of the local r.m.s.d. in noisy electron-density maps.

The local average of the electron density (Wang, 1985[Wang, B.-C. (1985). Methods Enzymol. 112, 813-815.]; Leslie, 1987[Leslie, A. (1987). Acta Cryst. A43, 134-136.]) or its local fluctuation around the mean (Abrahams & Leslie, 1996[Abrahams, J. P. & Leslie, A. (1996). Acta Cryst. D52, 30-42.]; Abrahams, 1997[Abrahams, J. P. (1997). Acta Cryst. D53, 371-376.]) have been used to perform phase improvement by density-modification tech­niques.

The BUSTER envelope is also computed by local variance filtering of a noisy density map. Local averaging is performed by convolution with a Gaussian G(B), parametrized by a Debye–Waller factor B, and a solid sphere mask S(R), parametrized by a radius R. These convolutions are used in two filtering operations that select high and low frequencies in a distribution ρ(x),

[\eqalignno {\rho^{\rm lo}(B,R)({\bf x})& = [\rho\ast G(B)\ast S(R)]({\bf x})& (17)\cr \rho^{\rm hi}(B,R)({\bf x})& = (\rho-\rho^{\rm lo})({\bf x)}.& (18)}]

All the convolution steps are carried out in reciprocal space, by calculation of a set of aliased structure factors (Roversi et al., 1998[Roversi, P., Irwin, J. & Bricogne, G. (1998). Acta Cryst. A54, 971-996.]), then Fourier-transformed to sample the density on the required grid.

For the (optional) high-frequency filtering, the following two measures of the local fluctuation around the local average can be defined:

  • (i) the local average of the absolute value of the deviation from the mean,

    [\omega({\bf x}) = [| \rho^{\rm hi}(B_{1},R_{1}) |\ast G(B_{2})\ast S(R_{2}) ](\bf x), \eqno (19)]

  • (ii) the local r.m.s.d. from the local average,

    [\omega({\bf x}) = \{ [\rho^{\rm hi}(B_{1},R_{1}) ]^{2}\ast G(B_{2})\ast S(R_{2})\}^{{{1}\over{2}}}({\bf x}). \eqno (20)]

The radius of the sphere for the high-pass filter is typically larger than the one for the low-pass filter in (19[link]) and (20[link]) (i.e. R1 > R2).

The high-frequency filter is useful in those cases where map Fourier components with DR1 are either absent or cannot be trusted; but it can be omitted if the lowest-resolution features are correct; in this case, the following two local averages can be computed, also by Fourier transforms:

  • (i) the local average of the absolute value of the density,

    [\omega({\bf x}) = [|\rho^{\rm lo}| \ast G(B_{2})\ast S(R_{2})]({\bf x}), \eqno (21)]

  • (ii) the local r.m.s. deviation from zero of the density,

    [\omega({\bf x}) = [(\rho^{\rm lo})^{2}\ast G(B_{2})\ast S(R_{2})]^{{{1}\over{2}}}({\bf x}). \eqno (22)]

Once ω(x) is available, mrand(x) should be obtained by homographic exponential modelling as described in the following section.

5. Homographic exponential modelling

We describe in this section a technique that affords a parametrization of low-resolution distributions and is used in BUSTER for computing macromolecular envelopes from noisy electron-density maps. The technique is a particular case of homographic mapping of a function e(x),

[e({\bf x})\rightarrow{{a+b\times e({\bf x)}}\over{c+d\times e({\bf x})}}, \eqno (23)]

where a = c = d = 1 and b = 0, and e(x) is an exponential e(x) = exp[ω(x)]; therefore, we propose to call it homographic exponential modelling.

The distributions obtained by homographic exponential modelling can be handled as values on a crystallographic grid and represent a new way of defining intrinsically `binary-like' macromolecular envelopes that are continuous and not binary. Alternatively, they can be parametrized with a finite set of coefficients in the expansion of ω, opening the way to ab initio low-resolution phasing based on phase permutation for a few coefficients of ω(x).

The potential of the homographic exponential modelling for ab initio phasing of envelope Fourier coefficients has been investigated by G. Bricogne and M. Ramin (G. Bricogne, unpublished results; Ramin, 1999[Ramin, M. (1999). PhD thesis. LURE, Université Paris XI, Orsay, France.]). Here, we introduce the technique and present the results of a test study, aiming at the assessment of the number of Fourier coefficients of ω(x) that are needed to satisfactorily reconstruct a given m(x) when a homographic exponential model is adopted.

5.1. The Fermi–Dirac distribution

The problem of defining a low-resolution envelope for the macromolecule based on an electron-density map can be restated in the form of assigning to each pixel in the map a probability of belonging to the bulk solvent, which we can write psolv(x). Correspondingly, pmacrom(x) = 1 − psolv(x) is then the probability that the pixel at x belongs to the macromolecular volume.

It is clear that we are dealing with each pixel as an entity that can be in one and one only of two possible states (pixel in the bulk solvent/pixel in the macromolecule), like a fermion whose spin can be either of ±½; an analogy can be drawn with the occupancy distribution function for a system consisting of a finite number of fermion particles with a given total energy. This occupancy distribution function fFD(E) follows a Fermi–Dirac distribution, depending on the temperature parameter βFD and on the chemical potential μFD (Reif, 1965[Reif, F. (1965). Fundamentals of Statistical and Thermal Physics, 1st ed., pp. 350-351. Singapore: McGraw-Hill.]),

[f_{\rm FD}(E) = {1}/\{{1+\exp[\beta_{\rm FD}(E-\mu_{\rm FD})]}\}. \eqno (24)]

The chemical potential μFD arises from the requirement that the number of fermions is finite. At temperatures close to zero, the low-energy states are occupied [probability fFD(E) ≃ 1] until the total number of fermions is reached; this defines the Fermi level (or Fermi energy μFD) of the system. The distribution quickly tails off to zero as the energy level increases; the states having energy higher than the Fermi level have zero occupancies unless the ratio of the energy gap (EμFD) over the mean thermal energy 1/βFD is small enough to permit some excitation.

By analogy, we can adopt some measure ω(x) of the local fluctuation of the electron density as an `envelope potential energy' and take β as inversely proportional to the r.m.s. error of the electron density (Blow & Crick, 1959[Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794-802.]),

[{{1}\over{\beta}} \propto \textstyle \sum \limits_{\bf h} \varepsilon_{\bf h} \left(1-{\rm FOM}_{\bf h}^{2}\right) F_{\bf h}^{2}, \eqno (25)]

FOMh being the figure of merit,

[{\rm FOM}_{\bf h} = ({ \langle \cos\varphi_{\bf h} \rangle^{2}+\langle \sin\varphi_{\bf h} \rangle^{2}})^{1/2}, \eqno (26)]

computed from the current phase probability distribution P(φh).

Where ω(x) is large with respect to the density r.m.s. error, it is highly unlikely that pixel x belong to the bulk solvent. So, for the probability that the pixel belong to the solvent, we can take

[p_{\rm solv}({\bf x}) \propto {{1}\over{1+\exp\{\beta[\omega(\bf x)-\mu ]\}}}. \eqno (27)]

The value of μ depends on the number of pixels that define the solvent region (or the solvent-volume fraction); it can be computed by histogramming the ω(x) function and choosing for μ the value of ω(x) that will give the correct number of pixels within the solvent, starting from the pixels where the fluctuation is lowest, and including all the pixels with increasing values of the local fluctuation, until the desired solvent fraction is achieved.

The probability that the pixel at x belongs to the macromolecule is then

[p_{\rm macrom}({\bf x}) = 1-p_{\rm solv}({\bf x}) \propto {{1}\over{1+\exp\{-\beta [\omega(\bf x)-\mu]\}}}. \eqno (28)]

5.2. Homographic exponential modelling of missing atoms envelopes

This section describes the homographic exponential modelling of macromolecular envelopes starting from noisy maps. In particular, a description is given of the calculation of an homographic exponential model for the missing atom envelope in the presence of the density for the partial structure [\rho_{\rm frag}({\bf x})] (see §[link]4.3).

Once the local density fluctuation ω(x) has been obtained along the lines described in §[link]4.3 and its histogramming has given the value of μmacrom that corresponds to the appropriate solvent fraction, one has the homographic exponential model for the whole macromolecular envelope,

[q_{\rm macrom}({\bf x}) = {{1}\over{1+\exp \{ -\beta_{\rm macrom} [\omega({\bf x})-\mu_{\rm macrom}] \}}}, \eqno (29)]

the value of βmacrom being proportional to the reciprocal r.m.s. error of the starting density (25[link]). Then, to exclude the fragment region from the prior-probability distribution for the random atoms, a homographic exponential model of the fragment density is needed. The local fluctuation ωfrag(x) can be computed based on ρfrag(x) as outlined in §[link]4.3; the values of βfrag and μfrag are computed from the r.m.s. error of the fragment model density and its fractional volume, as seen above. The homographic exponential model for the fragment density is then

[q_{\rm frag}({\bf x}) = {{1}\over{1+\exp \{-\beta_{\rm frag} [\omega_{\rm frag}({\bf x})-\mu_{\rm frag}]\}}}. \eqno (30)]

Finally, the homographic exponential model for the missing atoms envelope is obtained by imposing that the pixel lies in the whole macromolecule envelope but not in the fragment envelope,

[\eqalignno {q_{\rm rand}({\bf x})& = q_{\rm macrom}({\bf x})\times\left[1-q_{\rm frag}({\bf x})\right] & (31) \cr m_{\rm rand}({\bf x})& = {{V}\over{\textstyle \int_{V}q_{\rm rand}({\bf x})\,{\rm d}^{3}{\bf x}}} \times q_{\rm rand}({\bf x}). & (32)}]

5.3. A simple test

We describe here a simple calculation that investigates the behaviour of homographic exponential modelling of a known envelope m(x) under truncation of its Fourier spectrum, and compares it with a traditional finite-resolution Fourier expansion of the same m(x).

If m(x) is a given envelope and we intend to parametrize it using an homographic exponential model (28), we first map m(x) to the (0, 1) open interval by linear scaling,

[m'({\bf x}) = {{\left[m({\bf x})-\min m({\bf x})\right]}\over{\left[\max m({\bf x})-\min m({\bf x})\right]}}. \eqno (33)]

Then, we can compute the ω(x) from

[\omega({\bf x}) = {{1}\over{\beta}} \log \left[{{m'({\bf x})}\over{1-m'({\bf x})}}\right]+\mu. \eqno (34)]

Fourier analysis of ω(x), truncation of its Fourier coefficients at resolution d and Fourier synthesis of the truncated set of coefficients lead to the resolution-truncated ωd(x) distribution

[\omega_{d}({\bf x}) = {\overline {\cal F}}\{{X_{d}({\bf h})\times {\cal F}[{\omega({\bf x})}]({\bf h}})\}({\bf x}), \eqno (35)]

where the truncation of the Fourier spectrum of ω(x) at resolution d in (35[link]) is performed by multiplying it by the indicator function Xd(h),

[\eqalign {X_{d}({\bf h}) & = 1 \,\, {\rm if} \,\,h \geq d, \cr & = 0 \,\, {\rm if}\,\, h\ \lt\ d.}\eqno (36)]

The homographic exponential, resolution-truncated mHE,d(x) is then

[\eqalignno {m'_{{\rm HE},d}({\bf x})& = {{1}\over{1+\exp\{-\beta[\omega_{d}({\bf x})-\mu]\}}}, & (37) \cr m_{{\rm HE},d}({\bf x})& = {{V}\over{\textstyle \int_{V}m'_{{\rm HE},d}({\bf x})\,{\rm d}^{3}{\bf x}}} \times m'_{{\rm HE},d}({\bf x}). &(38)}]

We note here that for this particular test the actual values of β and μ are irrelevant, provided the same values are used in (34[link]) and (37[link]).

The conventional Fourier expansion of m(x), with truncation at resolution d, reads

[m_{{\rm FT},d}({\bf x}) = {\overline {\cal F}} \{{X_{d}({\bf h})\times{\cal F}[{m({\bf x})}]({\bf h})}\}({\bf x}). \eqno (39)]

mHE,d(x) and mFT,d(x) differ from m(x) because of the resolution truncation; mFT,d(x) has no Fourier components past d Å, while mHE,d(x), computed from the same number of Fourier coefficients, possesses extra-resolution owing to the exponential step.

In the following, we describe the test reconstruction of a model envelope for porcine pancreatic elastase (PPE; Meyer et al., 1986[Meyer, E. F., Radhakrishnan, R., Cole, G. M. & Presta, L. G. (1986). J. Mol. Biol. 189, 553-559.]; Schiltz et al., 1997[Schiltz, M., Shepard, W., Fourme, R., Prangé, T., de La Fortelle, E. & Bricogne, G. (1997). Acta Cryst. D53, 78-92.]). The model envelope m(x) was generated as explained in §[link]4.2, using the PDB-deposited structure, with a masking radius R = 2 Å and a blurring factor B = 100. A conventional Fourier truncation and a truncated homographic exponential model were used to reconstruct the model envelope, as explained above. As noted in §[link]2, all envelopes have been normalized so that their average in the unit cell is unity.

Table 1[link] reports the real-space overall correlation coefficients between the model envelope and its Fourier-truncated and homographic exponential-truncated reconstructions. The Fourier-truncated envelope gives marginally higher CCs when the resolution used for truncating the coefficients is lower than 25 Å: this is because the amplitudes and phases of the very few coefficients retained are exact for this envelope and not for mHE,d(x). Overall, the values of the CCs are very similar for the two methods, mainly because the correlation coefficients are dominated by the lowest resolution components, which are essentially correct in both maps.

Table 1
Porcine pancreatic elastase: real-space correlation coefficients between a model envelope m(x) and its reconstructions by truncated homographic exponential modelling [mHE,d(x)] and truncated Fourier synthesis [mFT,d(x)]

Resolution d (Å) (No. coeffs) 〈CC(m, mHE,d)〉 〈CC(m, mFT,d)〉
30 (7) 0.594 0.604
25 (12) 0.634 0.662
20 (22) 0.760 0.758
15 (51) 0.840 0.832

More informative is the visual inspection of sections of the envelopes. Fig. 1[link] shows a section in the [100] plane of the PPE crystal for the model envelope; Figs. 2[link] and 3[link] show the same section of the 15 Å, Fourier-truncated and homographic exponential truncated envelopes, respectively, mFT,d=15Å(x) and mHE,d=15Å(x). In Fig. 2[link], mFT,d=15Å(x) shows the well known Fourier artefacts arising from truncation: negative ripples, peaky features and a smeared out protein–solvent boundary. In Fig. 3[link], mHE,d=15Å(x) is positive everywhere, has a flatter protein ceiling, a steeper slope at the solvent–protein boundary and a flatter solvent floor, with few oscillations. The solvent regions match the ones in the model envelope.

[Figure 1]
Figure 1
Porcine pancreatic elastase, [100] section of the model envelope m(x). Section: 57.973 × 75.32 Å. The centre of the section is the macromolecule's centre of gravity. The density was obtained by masking with a radius of 2 Å around the model and blurring with a Gaussian temperature factor B = 100.
[Figure 2]
Figure 2
Porcine pancreatic elastase, [100] section of the 15 Å truncated Fourier reconstruction of the model envelope, mFT,d=15Å(x). Size and orientation as in Fig. 1[link]. The density was obtained by truncating the Fourier spectrum of the model density at 15 Å [51 data; see (39[link])].
[Figure 3]
Figure 3
Porcine pancreatic elastase, [100] section of the 15 Å truncated homographic exponential reconstruction of the model envelope, mHE,d=15 Å(x). Size and orientation as in Fig. 1[link]. The density was obtained by truncating the ω spectrum at 15 Å (51 data) and recomputing the homographic exponential model (37).

Table 2[link] contains the correlation coefficients between Fourier coefficients of the model PPE envelope and the Fourier coefficients of the 15 and 20 Å truncated homographic exponential model. Fig. 4[link] plots the same Fourier coefficients in resolution ranges. The fluctuations observed are typical of the spectrum of macromolecular envelopes; still, the amplitudes of the Fourier components of mHE,d=15Å(x) retain an average correlation coefficients as high as 0.306 up to 8.2 Å, owing to the extrapolation achieved by the exponential step.

Table 2
Porcine pancreatic elastase: reciprocal-space correlation coefficients between the Fourier components [\cal F][m(x)](h) of a model envelope and the Fourier components [\cal F][mHE,d(x)](h) of its truncated homographic exponential reconstruction

  〈CC{[{\cal F}][m(x)](h), [{\cal F}][mHE,d(x)](h)}〉
Resolution (Å) (No. coeffs) d = 15 Å d = 20 Å
14.1 (61) 0.982 0.920
10.0 (93) 0.170 0.125
8.2 (125) 0.306 0.087
7.1 (136) 0.118 −0.007
6.3 (151) 0.042 −0.040
5.8 (166) 0.154 0.079
[Figure 4]
Figure 4
Porcine pancreatic elastase. Fourier components of the model envelope 〈[{\cal F}][m(x)](h)〉 and of its 15 Å truncated reconstructions 〈[{\cal F}][mFT(x)](h)〉 and 〈[{\cal F}][mHE(x)](h)〉. Fs were averaged in groups of ten data each. The correlation coefficients 〈CC{[{\cal F}][m(x)](h), [{\cal F}][mFT(x)](h)}〉 are not shown because they are 1.0 for d > 15 Å and zero for d < 15 Å.

6. Conclusions

The macromolecular envelope mrand(x) is a continuous distribution and not a binary mask; even regions of low density (or low-density r.m.s.d., if a variance filter is used) can therefore be retained within the envelope, with a (possibly small) non-zero probability. The subsequent maximum entropy modulation of the envelope itself therefore has a chance of building up density in the same regions. This has potential in structure completion by density-modification techniques. The only other published example of solvent flattening using real-space continuous probability distributions is the Gaussian distribution described by Terwilliger (1999[Terwilliger, T. C. (1999). Acta Cryst. D55, 1863-1871.]). The map-based algorithm implemented in BUSTER[link]5) differs from the past published ones in that the macromolecular envelope is a homographic exponential model and therefore can be parametrized with a few coefficients of ω while still retaining its `binary-like' character.

Supporting information


Footnotes

1For a recent illustration of the use of the Babinet principle in bulk-solvent correction, see Guo et al. (2000[Guo, D., Blessing, R. H. & Langs, D. A. (2000). Acta Cryst. D56, 451-457.]).

2Suffice here to say that first [{\cal F}][mrand(x)](h) is computed by taking the products of [{\cal F}][χrand(x)](h) and [{\cal F}][G(x; Bfrag)](h); then, the set of [{\cal F}][m(x)rand](h) are made periodic on the lattice reciprocal to the real-space crystallographic grid. These aliased structure factors undergo Fourier synthesis and mrand(x) is sampled on the desired grid; the aliasing ensures that the mrand(x) distribution is positive everywhere and free from Fourier-truncation artefacts.

Acknowledgements

This work was partially supported by a TMR Marie Curie Grant (to PR) and a Sponsored Research Agreement from Pfizer Central Research (to GB). We wish to thank one of the referees for extremely helpful reviewing of the manuscript.

References

First citationAbrahams, J. P. (1997). Acta Cryst. D53, 371–376.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationAbrahams, J. P. & Leslie, A. (1996). Acta Cryst. D52, 30–42.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBlow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationBricogne, G. (1993). Acta Cryst. D49, 37–60.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBricogne, G. (1997). Methods Enzymol. 276, 361–423.  CrossRef CAS Web of Science Google Scholar
First citationBricogne, G. & Irwin, J. J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory.  Google Scholar
First citationCollaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763.  CrossRef IUCr Journals Google Scholar
First citationGuo, D., Blessing, R. H. & Langs, D. A. (2000). Acta Cryst. D56, 451–457.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLeslie, A. (1987). Acta Cryst. A43, 134–136.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMeyer, E. F., Radhakrishnan, R., Cole, G. M. & Presta, L. G. (1986). J. Mol. Biol. 189, 553–559.  CrossRef PubMed Web of Science Google Scholar
First citationPerrakis, A., Morris, R. & Lamzin, V. (1999). Nature Struct. Biol. 6(2), 458–463.  Web of Science CrossRef Google Scholar
First citationRamin, M. (1999). PhD thesis. LURE, Université Paris XI, Orsay, France.  Google Scholar
First citationReif, F. (1965). Fundamentals of Statistical and Thermal Physics, 1st ed., pp. 350–351. Singapore: McGraw–Hill.  Google Scholar
First citationRoversi, P., Irwin, J. & Bricogne, G. (1998). Acta Cryst. A54, 971–996.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRoversi, P., Irwin, J. & Bricogne, G. (2000). In Electron, Spin and Momentum Densities and Chemical Reactivities, edited by P. G. Mezey & B. E. Robertson. Dordrecht: Kluwer. In the press.  Google Scholar
First citationSchiltz, M., Shepard, W., Fourme, R., Prangé, T., de La Fortelle, E. & Bricogne, G. (1997). Acta Cryst. D53, 78–92.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTerwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTronrud, D. E. (1997). Methods Enzymol. 277, 306–319.  CrossRef CAS PubMed Web of Science Google Scholar
First citationTronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489–501.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationWang, B.-C. (1985). Methods Enzymol. 112, 813–815.  Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds