4.18. CRISP: Well baked structural solution

[Caveat: This program should not be confused with the program CRISP marketed by Calidris for image processing of electron micrographs]

Authors: Doug du Boulay & Syd Hall, Crystallography Centre, University of Western Australia, Nedlands, WA 6907, Australia

CRISP combines tangent direct methods with iterative real to reciprocal space transformation methods to determine organic or organometallic molecular structure. This program is numerically very intensive and combines Fourier calculations with block diagonal least squares structural refinement as a basis for excluding or incorporating density peaks as real atomic sites in the model. It relies on accurate atomic resolution reflection data, and an extremely good guess at the actual cell contents.

4.18.1. Solution Method

CRISP uses the GENSIN structural invariants and generators as a basis for phasing generator reflections using tangent direct methods. Random starting phases are assigned to all generator reflections and, in GENTAN terminology, phases are refined in BLOCK mode, for up to 30 cycles. By default CRISP refines 64 random phase sets, logging the best 12 phase sets (according to their refinement figs of merit) for possible further analysis. By default only the first (i.e. best) of those 12 is examined further for molecular geometry.

CRISP relies on the user to specify (in STARTX) a very good approximation of the expected unit cell contents using the celcon lines. This is quite crucial, especially so for heavy atoms. Hydrogen atoms are essentially ignored. Generally it is thought unlikely that no information about atomic contents is available. In addition BDF archive should contain Frel values and E values.

The phases and E values of the phased generators are used to calculate a fast Fourier transform emap at 4 point per angstrom resolution. The Emap is searched for the expected number of map peaks,which are ranked according to peak height. Typically a volume larger than the assymetric unit is generated, but symmetry releated peaks are identified and rejected.

The ranked peak list is used as a basis for assigning atom types. If heavy atoms are expected, then those atoms are broken up into different ranges according to their atomic number. In the first instance, only the largest Z value atoms are assigned to the top peaks. (One shortcoming of this algorithm is that the top emap peaks do not necessarily correspond to the heaviest atoms. A second shortcoming is that this algorithm does not take into account the fractional scattering power of the expected atoms, so for instance if there is one S atom and 200 C atoms the likelihood of successful structure determination is very low if only 1 S atom is assigned in the first instance)

With the top peaks assigned to the heaviest atoms, the scale factor and overall temperature factor are least squares refined (5 cycles), initialized with values determined from GENEV. If the new refined values are beyond certain limits, it is an indication that the atom type assignment maybe incorrect, or relatively incomplete and the scale and overall temperature factor revert to their GENEV values.

Subsequently the atom site x, y, z and individual isotropic temperature factors are refined using 5 cycles of diagonal least squares (using Frel-Fcal). Positional constraints are applied to sites on symmetry positions. The refined temperature factors are used as to critically examine the assigned atomic types. If the temperature factor is too high an atom type may be demoted to a lower atomic number atom or rejected altogether, whereas if it is too low it can be promoted to the next higher atomic number amongst the expected atom types. Atom type promotions are not invoked until later iterations when C atoms are being actively sought, that is unless the refined temperature factor for an individual site becomes <=0.

Also in later interations additional peaks beyond the number expected based on the celcon lines are added, in batches and rerefined. Between the addition and rerefinments, the scale factor and overall temperature factor are refined also.

If no more peak sites are or can be added, the current structural solution is used as a basis for calculating new structure factor phases for use with Frel values for Fourier transformation to an Frel map. During early iterations, this may be only for reflections with E>1.0 whereas for later iterations all reflections with F>2 are used. The frel map is known to exhibit significant series termination ripple, predominantly around the heavy atoms, but these spurious map peaks are relatively localised and are largely excluded from any downstream processing based on geometrical (bond-radii) constraints. The advantage of using an Fmap at this stage is that the peaks recovered from this new map have peak heights much more in keeping with the atomic numbers of the expected atomic types. In addition, the new peak list is more readily matched with the results of previous iterations. At this stage the next lower level of heavy atoms (if any) is assigned and the refinements of scales, coordinates and temperature factors is repeated. If the last atomic number range has already been invoked then a final iteration is made for solution comparison purposes.

After a number of interative cycles corresponding to the number of atom ranges + an extra comparison cycle, it is hoped that the peak assignment and refinement will have converged to the correct molecular solution. The resultant solution is saved to a .sol file. In default mode, if the solution converged (same solution in final and previous cycle) then CRISP will terminate and the molecular viewing program PIG is automatically invoked to view the solution. Idealy only one solution is required, and that solution is most likely to come from the phase set with the best figure of merit.

NOTE: To actually use a particular solution, you have to explicitly ensure that a bdf will be written out in PIG, and ensure that the solution you require is currently active as you exit the PIG program. This will write out the current atom sites to the .add file for automatic reentrant streaming into ADDATM.

4.18.2. Control options

When a non-default number of best phase sets are requested for examination, subsequent solutions are appended to the .sol file. The automatic invokation of PIG at CRISP termination permits the the interactive NEXT , PREVIOUS controls to be used to step forward and back through the requested solutions (up to 200). When PIG is used in this mode, by default IT WILL NOT write an output archive file containing the solved structure. This must be requested explicitly from within PIG.

The optim n control card on the CRISP line controls the level of optimization to which the required structural solutions should be studied. At the zeroth level of optimization n=0 only the initial emap peaks are written to the .sol file. The number of peaks is still governed by the STARTX celcon lines.

At the first level of optimization n=1 , the STARTX celcon contents are imposed as a whole, on the emap peak list and refined. There is no attempt at an iterative solution.

At optimization level n=2 , the STARTX celcon contents are iteratively imposed on successive peak lists, with successively lighter heavy atom ranges added at each iteration. There is no attempt to look beyond the input cell contents for extra unexpected sites.

The final optimization level, n=3 , is identical to n=2 , except that the residual peaks in the peaklist are trialed to test if for unexpected atom type numbers (but only for the C atom range). This additional testing is quite time consuming.

4.18.3. Caveats

CRISP has been found to work reasonably well for centrosymmetric structures but less well for acentric structures. For some structures with strong pseudo symmetry, direct methods cannot be used effectively and other techniques such as patterson searches may prove more useful. CRISP automatically invokes GENSIN and GENEV if the structure invariant file .inv is missing. Sometimes it may be necessary to use more than the default number of invariants and in that case GENSIN should be run manually with a reduced Emin value or an explicitly larger number of triplets should be requested.

4.18.4. File Assignments