Authors: Henk Schenk and Syd Hall
Contact:Syd Hall, Crystallography Centre, University of Western Australia, Nedlands 6907, Australia
SIMPEL applies the symbolic addition procedure to triplet and/or quartet structure invariant relationships to determine structure factor phases from normalized structure factors. The program is space group independent, and the origin and enantiomorph is specified automatically, or may be selected by the user. SIMPEL contains a wide range of phase selection and extension options. Different procedures for accepting, propagating, and evaluating symbol phases are provided and a variety of figure-of-merit tests are available to identify the correct symbol phases. Up to 16 phase sets may be output for subsequent E-map calculations.
Symbols have been used with structure invariant relationships to determine structure factor phases from the beginning of direct methods. Symbolic addition procedures (SAP) compete with and complement the other main direct-methods approach, the multi-solution procedure. SAP defines and extends phases as symbols which are evaluated at the conclusion of the process. No permutation of phase sets is involved, as in the multisolution procedure, so that the process is fast.
There is considerable background literature on the symbolic addition procedure. In particular we refer to Karle & Karle (1966), Karle (1974) and Schenk (1980). References specific to the SIMPEL approach to symbolic addition can be found in Overbeek & Schenk (1978), Schenk (1983) and Schenk & Kiers (1984).
The first step in the symbolic addition procedure is the specification of the origin/enantiomorph defining phases, and the selection of symbolic phases. These phases, referred to as the starting set, must be those which will most reliably propagate to all other large E-values via the triplet and quartet structure invariants. This is the pivotal step in the phasing process and considerable care is taken to ensure that the best possible starting set is selected. The starting phases are selected from the largest |E|-values (25% of total, or user option on the SIMPEL line) using a convergence type procedure as described by Germain and Woolfson (1970). Convergence rejection is based on either:
(1) 2 for centrosymmetric and noncentrosymmetric
or (2) tanh(
/2) for centrosymmetric or
I1(
)/I0(
) for noncentrosymmetric
See
the GENSIN writeup for the full definition. The result of this
procedure is a set of reflections which form an optimal choice for a starting
set. Within this set origin (and enantiomorph, if required) defining
reflections are assigned phases first, and then several symbols are assigned
amongst the remaining reflections of the set (maximum is 8). The assigned
phases are applied to |E|-values above the convergence limit using triplet
and/or quartet invariants (user option on start line).
Generated phases which have consistent values (no conflicts) are accepted and
set at a weight of 1.0 for the rest of the phasing procedure.
A divergence or accessibility
procedure is used to test if the starting set of phases selected in Step 1 will
propagate satisfactorily within the divergence set of generators (default is
50% of total). The procedure checks if the phases of all |E|-values can be
accessed via sufficiently reliable invariant phase relationships. No test for
(symbolic) phase consistency is made in this procedure. New phases are accepted
as accessible if the sum of the s of each invariant exceeds a threshold
value (ALTHR starts as 10th largest
of input invariants, or user option
on the start line). The threshold value ALT(m) is varied
according to the number of invariants, m, used to derive a phase. For example,
if for one invariant ALT(1) = ALTHR, for two invariants ALT(2) = 1.3*ALT(1),
for three invariants ALT(3) = 1.3*ALT(2) and so on. For each next cycle the
threshold value is reduced by a factor of 0.9. This ensures that the most
probable phases are accessed first.
If more than 10 of the |E|-values in the divergence set remain unphased, additional symbol phases are assigned to unaccessed reflections. This enables any unconnected groups of related reflections to enter easily into the phasing process.
Once the starting set of phases has been fixed by the convergence and divergence processes, they are used to phase all remaining |E|-values. The starting set is composed of numeric phases assigned to specify the cell origin (and enantiomorph), and symbol phases to ensure complete phase extension. The starting set also contains reliable numeric and symbolic phases which were derived in the earlier processes. All these phases, assigned and derived, are now considered to be active phase propagators and have been assigned a weight of 1.0.
The symbolic addition process uses these phases and the structure invariant relationships to derive the phases of the other |E|-values. Before accepting a newly derived phase into the list of known phases (and thus successively using this reflection to derive additional phases) it must be carefully checked for reliability. If an incorrect phase is accepted, it can lead to a failure of the whole phasing process.
Two different mechanisms are used in SIMPEL to test if a new phase is suitable as an active phase propagator. These are the "probability threshold" test and the "multisymbol acceptance" test. In each case there are two separate procedures for performing these tests - each has different properties that may suit the solution of particular structural types.
A
new phase is derived by substituting known phases into one, or more, structure
invariant relationships. Each of these relationships has an a value which is a
measure of the probability that the phase relationships have a value of zero
(base module 2). If
is high, there is a high probability this is
true; if it is not, the relationship must be used with caution, or in
conjunction with other relationships. The sum of these
's is therefore
an important test in gauging the probable reliability of a new phase. In SIMPEL
the sum of
's may be applied in two different ways. They are:
(i) Weighted Alpha method ( wa ): Each phase in the active phase list is assigned a weight according to its expected reliability. These weights range from WMIN to 1.0 and are calculated for centrosymmetric phases as
W = tanh c / ALTHR
and for non-centrosymmetric phases as
W = min(1.,c / ALTHR) where
alpha;c = [ {Wk
k sin
k }2 + {
Wk
k
cos
k}2 ]1/2 (summed over m).
ALTHR is the threshold
value specified by the user, or set automatically as the 10th largest
of input invariants. This is the same ALT used in divergence process.
Wk is the combined weight of the component phases in the invariant
derived from
Wk = Wj /
Wj j=1
to 2 (triplets), or 3 (quartets).
A phase is accepted if its calculated weight Wk is above the minimum weight WMIN. WMIN is an input option or is set automatically to 0.3. In this method phase acceptance is a relatively smooth and continuous process. Each new phase given an associated reliability index; an index which is used to determine the reliability of subsequent phases (i.e. the history of prior determinations has a bearing on future phase estimates). The calculation of Wk is slower than the alternative but this tends to be offset by the more rapid propagation of phases.
(ii) Alpha Ninv method ( an ): Another test for
phase acceptance is available in SIMPEL based on the same procedure described
above in Step 2. The weights of all active phases are assumed to be 1.0. The
calculated of a new phase (see
c definition above) is
tested against ALT(m), where m is the number of invariants used to calculate
ac and the new phase. The values of ALT(m) are preset as described
in Step 2 above.
This procedure is relatively simple and fast and is based on a
relatively demanding criterion for acceptance. It is discontinuous (i.e. it
accepts or rejects - nothing in between) and therefore requires more phasing
cycles than (i). It also does not use the relative reliability of the active
phases. This may be particularly important for non-restricted phases.
In the symbolic addition process a second phase acceptance test is applied when more than one structure invariant is used to derive a new phase (i.e. m>1). This test has two separate modes of operation.
(i) Accept multiple symbol indications ( mult ):
If two or more symbol combinations are generated for new phase (e.g. say, -A
and +AD), the phase is still accepted provided the strongest indication (i.e.
largest c) satisfies acceptance test I. This mode assumes
that certain symbol combinations are equivalent (i.e. will reduce to the same
numeric phase) and promotes a new phase to the active propagation role provided
it satisfies the probability acceptance criteria. This is the default mode.
(ii) Reject multiple symbol indications ( sing ): In this mode a new phase is rejected if more than one symbol combination is derived. This is a more demanding requirement than in (i) and means that fewer phases are promoted to the active list to assume the role of phase propagators. In this mode the symbolic addition process requires more cycles and the statistics available to the subsequent figure-of-merit tests are fewer in number.
It should be noted that in previous versions of SIMPEL the only available options are I(ii) and II(ii). These are the most conservative options in accepting new phases, and have been used successfully in the past. The strong point of the alternative acceptance criteria I(i) and II(i) is discussed above and in view of this these are currently set as the default modes. Users are advised that if the defaults fail to provide a solution the more conservative combination of I(ii) and II(ii) should be applied.
In Step 3. the starting phase set is expanded into a larger list of known phases containing numeric and symbolic values. In this step a final symbolic addition cycle is applied so that all phase estimates can be tabulated as symbol correlation statistics (Schenk, 1971). Only phase estimates that involve more than one symbol combination will contribute to these statistics. For example if a given phase is estimated form ten different invariants to be:
where m is the number of invariants, this would lead to the correlation statisticsThis process assumes that different symbol indications for the same reflection are in fact equal and may therefore be correlated. The statistics above are consistent with symbols A and B having the value of 180°.
The symbol correlation table is then used to test the plausibility of numeric values for each symbol. Symbols assigned to restricted phases are assigned their two possible values (e.g. p/2 and 3p/2) and symbols assigned to unrestricted phases are tested for the numeric values in the range 0 to 2p in intervals of p/4. The correlation statistics are used to calculate a correlation factor QFAC (Schenk, 1971) that has a maximum value of 100 if the numeric phases agree exactly (and -100 if they disagree exactly!). In space groups with translational symmetry (non-symmorphic) cofactors greater than 50 are good, and >70 are excellent. However, in the other space groups the QFAC is less indicative.
The last part of this step orders the phase sets in descending magnitude of cofactor. Only the top phase sets (16, or specified by user) will enter into the more exhaustive figure-of-merit tests in the next step.
The previous step selected, and ordered, the numeric phase combinations that have the best chance of being correct. In this step each of these combinations is applied in a separate symbolic addition cycle to provide the agreement statistics needed to calculate various figures-of-merit. A figure-of-merit is intended to discriminate between a 'good' phase set and a 'bad' phase set (i.e. one that may provide a correct solution from one that will not). Not all FOM's of the original SIMPEL versions are implemented at this time, but will be considered for future development. On the other hand, several FOM's have been added that are not present in the other SIMPEL versions.
This figure-of-merit is a reformulation of QFAC calculated in Step 4. It is
QFOM = 1.5 - QFAC/100.
In accordance with all other FOM values, the best QFOM is the lowest. It has an active range from 0.5 to 2.5, and any value below 1.1 is considered good, and above 1.5 is considered unlikely. QFOM is, of course, correlated to the symbol extension process and cannot be considered an independent phase set discriminator in the same sense as the FOM tests PSI0 and NEGQ. Caution must therefore be exercised in interpreting small differences in QFOM values.
This parameter is the inverse of the CFOM parameter of the MULTAN program (Main et al., 1980) and has the form
RFOM = (
<
> -
r ) / (
c -
r)
(summed over all h)
where <> is the expected
of a phase, and
r is the
if all phases were randomly distributed. For
a correct phase set the value of
c should approach that of
<
> and RFOM should tend to 1.0. Incorrect phase sets will deviate
significantly from 1.0, random phases towards 2.0, and overcorrelated phases
towards 0.0. In general, however, phase sets with small RFOM's are more likely
to be correct than those with large RFOM's. The range of RFOM's will vary
according to the validity of the estimate of <
>. For this reason
RFOM tends to be less reliable for strongly non-random structures.
The RFAC parameter is similar to the residual FOM calculated in MULTAN (Main et al., 1980) except for a scale that takes into account the relative dominance of heavy atoms in the structure.
RFAC = { |
c - <
>| }
/
<
> (summed over all h)
RFAC is a minimum when there is close correspondence between c and <
>.
In this respect it is very similar to the R-factor of Karle and Karle (1966).
RFAC is, like RFOM, dependent on the reliability of <
>.
PSI0 triplet invariants (Cochran and Douglas,1955)
provide a figure-of-merit which is largely independent of the triplet and
quartet invariants used in the tangent refinement. A PSI0 triplet relates two
strong reflections (with |E| > EMIN) to a third which has an |E|-value as
close as possible to zero (see the GENSIN writeup). The phases estimated from a
series of PSI0 triplets are expected to be random when the contributing phases
from the other two large-|E| reflections are correct. When this is the case the
resulting values of c are significantly lower than if the
distribution of contributing phases was biased or incorrect. These invariants
are used to form the figure-of-merit
PSI0 = c/
<
>
(summed over psi0 triplets ).
PSI0 should be smallest for the correct phase set. PSI0 is, along with NEGQ, one of the most independent methods of measuring the relative likelihood of success.
Quartet structure invariant relationships are classified according to the magnitude of their crossvector |E| values. When the crossvector |E|'s are small there is a high probability that the phase invariant has a value close to p rather than 0 (Hauptman, 1974; Schenk, 1974). These invariants are referred to as negative quartets. In SIMPEL negative quartets are not used in the phasing process but are retained as a test of the phase sets. The negative quartets are considered independent because, unlike the positive quartets, they cannot be represented by a series of triplet invariants. They provide, therefore, a separate estimate of the phases. A direct comparison of these phases provides the basis for the figure-of-merit (Schenk, 1974).
NEGQ = [
c |
k -
k| ] /
c
(summed over all k neg. quartets)
where
k is the phase estimated from triplets and positive quartets,
and
k is the phase estimated from negative quartets alone.
Correct phase sets should have low values of NEGQ ranging from 0 for
centrosymmetric structures, to 20-60o for non-centrosymmetric
structures. Note that if fragment QPSI values are used the value of
is
automatically set to 0 and the NEGQ test will remain valid. This FOM is a very
powerful discriminator of phase sets provided that sufficient negative quartets
are available.
The combined FOM is a scaled sum of the FOM parameters QFOM, RFOM, RFAC, PSI0 and NEGQ.
CFOM = [WFOMi (FOMMAXi-FOMi)/
(FOMMAXi-FOMMINi)]
(i=1 to 5)
The weights WFOM may be specified on the SETFOM control line. These values are subsequently scaled so that the maximum value of CFOM is 1.0. It is important to emphasise that CFOM is a relative parameter and serves only to highlight which is the best combination of FOM's for a given run. It does not indicate if a given FOM will provide a solution.
The AMOS parameter is a structure-independent gauge of the correctness of a phase set. It uses pre-defined estimates of the optimal values for the FOM parameters QFOM, RFOM, RFAC, PSI0 and NEGQ. OPTFOM values may be user defined (see setfom line). Rejection values for the four FOM parameters are derived from the OPTFOM values as REJFOM = 3*OPTFOM. The default values are as follows:
The absolute measure-of-success parameter is calculated from all active FOMs asAMOS = [ WFOMi ( REJFi - FOMi/
OPTFOMi ] (i=1 to 5)
where the WFOM values are scaled so that AMOS ranges from 0 to 100. In addition to being used to sort phase sets in order of correctness, the AMOS values provide a realistic gauge of the correctness of phase sets. As a rule of thumb, they can be interpreted in the following way:
AMOS | |
100-81 | high probability of being correct set |
80-61 | good chance of being correct set |
60-41 | possibility of being correct set |
40-21 | low probability of being correct set |
20-0 | very unlikely to be correct set |
These classifications are only approximations. The predictability of optimal FOM values can be perturbed by a variety of structure dependent factors and by the FOM weighting.
Phase sets must satisfy certain criteria before being considered for possible output to the binary file for subsequent E-map calculations.
FOM Rejection Criteria | Message |
Reject if QFAC > REJFOM(1) | REJECT1 |
Reject if RFOM > REJFOM(2) | REJECT2 |
Reject if RFAC > REJFOM(3) | REJECT3 |
Reject if PSI0 > REJFOM(4) | REJECT4 |
Reject if NegQ > REJFOM(5) | REJECT5 |
Reject if |av.![]() ![]() | REJECT10 |
The value of <av.> is 90° for centrosymmetric structures and 150-180° for
non-centrosymmetric structures. This test avoids the "all-plus catastrophe"
phase set.
Reads |E| values from the input archive bdf
Writes the estimated phases to the output archive bdf
Reads structure invariant relationships from bdf inv
This is the standard run in which all defaults will be applied
to the converge, diverge, symbolic addition and FOM testing processes. All
|E|-values used in the GENSIN program, and all invariants entered on
inv, will be applied in the phasing process. Phase extension
will be based on weighted s.
Cochran, W. and Douglas, A.S. 1955. Proc. Roy. Soc. A277, 486-500.
Germain G., Main, P. and Woolfson, M.M. 1970. Acta Cryst. B26, 274-285.
Hauptman, H.A. 1974. Acta Cryst. A30, 472-476.
Karle, J. and Karle, I.L. 1966. Acta Cryst. 21, 849.
Karle, J. 1974. International Tables , Vol. IV, section 6, 337.
Main, P. 1980. Multan-80 York England: University of York. Overbeek, A.R. & Schenk, H. 1978. Computing in Crystallography, University Press, Delft, p108-112.
Schenk, H. 1971. Acta Cryst. B27, 2037-2039.
Schenk, H. 1974. Acta Cryst. A30, 477-481.
Schenk, H. 1980. Computing in Crystallography, eds. R. Diamond, S. Rameseshan and K. Venkatesan, Indian Academy of Sciences, Bangalore, p701-722.