A protein crystallographer's toolkit: essential programs for successful structure determination

Claire Naylor

School of Crystallography, Birkbeck, University of London, Malet St, London, WC1E 7HX, UK.
E-mail:
c.naylor@mail.cryst.bbk.ac.uk - WWW: http://www.cryst.bbk.ac.uk/people/claire_naylor.html

Background

In Sales speak, (protein) crystallographer's have traditionally been 'early adopters' of new computing technology. As a community, we were in the vanguard of those utilising the new computers for practical work before World War II, and this has continued through to the modern day with a steady stream of ever more complex programs taking advantage of the explosion in computing power. The requirement for such high level computing has meant that protein crystallography tended to be carried out in dedicated departments, with well set up networks, and access to a large pool of computational expertise. In these circumstances, we have perhaps been perceived by more laboratory-based molecular biologists as happiest when situated as in figure 1.

[happy molecular biologist]

Fig. 1: A happy molecular biologist, content with a nice suite of installed crystallographic programs

More recent years have seen a massive explosion in the size of the community and in the accessibility of the technique to researchers without extensive training in the area. Crystallographic groups made up of perhaps only 1 or 2 senior researchers within more general molecular biology departments are becoming the norm, and in this environment, setting up an appropriate computing environment can prove challenging, especially for those with little computer management experience. It can be difficult to identify all the software required, and in such situation the crystallographic computing environment can unfortunately deteriorate rapidly to that depicted in figure 2a and b.

[unhappy molecular biologist] [unhappy molecular biologist]

Fig. 2 (a, b): A not so happy molecular biologist, whose frustration is possibly being caused by a narrow and/or poorly installed suite of crystallographic programs.

The department of crystallography at Birkbeck College has a long history and a successful crystallographic computing environment, this article contains a brief description, with links to appropriate Websites, of programs that we find essential in the structure determination process. Two things will strike the reader immediately: the programs are free to academic institutions, and thus there is no barrier to the new researcher becoming involved in crystallography for the first time. There is some redundancy: in many cases, we maintain two program suites that apparently do the same job in different ways. It is frequently found that problems will yield to one form of analysis, but appear intractable under another, such redundancy is therefore always desirable.

Operating systems

Protein crystallography has in the past relied on Silicon Graphics machines in many circumstances, though this is now yielding to cheaper Linux boxes that are at least as efficient. Compilations for both these machine types are thus generally available. Dec alpha machines are also used by the community, and compilations for this system are usually available. The Mac OS-X system is much less frequently used, but fortunately there are enthusiasts willing to share their experience of porting crystallography programs to this operating system (http://chemistry.ucsc.edu/~wgscott/crystallography_on_OS_X.html).

CCP4

Finally the programs! Firstly, in a category of its own, comes the Combined Computational Project 4 (CCP4; http://www.ccp4.ac.uk/), supported by the United Kingdom's BBSRC. This provides a program suite not only covering all aspects of protein crystallography from data collection to data validation and graphics, but also provides utilities to port your data from/into a variety of other well known crystallography programs. CCP4 is currently at version 4.2.2 with 5.0 expected later this year. Following the introduction of an easily used graphical interface (ccp4I; figure 3), this is probably the single most heavily used set of programs in the department. New graduate students have little difficult understanding it and can make rapid progress. In addition, dedicated support staff and one of the community's most popular bulletin boards mean that help is always available to the new user. Individual programs available through CCP4 will be described in the sections below.

[CCP4 GUI]

Fig. 3: Example of the graphical user interface in the CCP4 suite

Data processing and scaling

Birkbeck maintains two packages for this process. Mosflm (http://www.mrc-lmb.cam.ac.uk/harry/frames/) and Scala (both part of the ccp4 suite), and the free-to-academics version of the HKL package (http://www.hkl-xray.com/). Both have graphical interfaces (see figures 4 and 5), but Mosflm is perhaps easier for the novice user to handle, consisting primarily of clickable menu options, while Denzo utilises a command line. Both, however, are very effective and processing even the most difficult datasets.

[MOSFLM]

Fig. 4: Screen image of MOSFLM

[DENZO in action]

Fig. 5: Screen image of DENZO in action

Heavy atom site identification

There are now many programs available for the automatic determination of heavy atom positions, by a number of different methods, all are successful in different circumstances and we find it is invaluable to support as large a variety as possible. Strategies available within the department to solve the heavy atom substructure include: Patterson search, as employed in rsps under ccp4, and the far more sophisticated and comprehensible Solve from T. Terwiliger (http://resolve.lanl.gov/); direct methods in shelxd a part of G. Sheldrick's Shelx suite (http://shelx.uni-ac.gwdg.de/SHELX/) and Acorn, the latest direct methods program available under ccp4; and finally the combined direct methods and Molecular dynamics methods employed by SnB from (http://www.hwi.buffalo.edu/SnB/), now also available as part of the site identification and refinement package, BnP (http://www.hwi.buffalo.edu/BnP/).

Heavy atom refinement

There is no doubt that the best program for carrying out this task is Sharp, written by G. Bricogne et al and available from http://www.globalphasing.com/. The installation of this program is none trivial for the novice computer manager, but is worth the large amount of effort this take for the high quality maps it produces. Both BnP and Solve/Resolve also carry out heavy atom position refinement.

Molecular replacement

CCP4 provides three very useful programs for structure solution by molecular replacement. J. Navaza's AmoRe is very rapid, allowing the user to carry out multiple trials of different models and resolution ranges to maximise the possibility of finding a solution. MolRep, a more sophisticated program that utilises Maximum Likelyhood targets, is slower, but more likely to find the right answer with a poorer model. Finally R. Read’s Beast is the latest molecular replacement program, it is extremely slow and thorough but very successful, and is thus the best choice for the most challanging problems. Also supported in the department are two programs that use a different methodology: EPMR (http://www.msg.ucsf.edu/local/programs/epmr/epmr.html) uses an evolutionary search algorithm: repeated rounds optimizing rotational and translational parameters for a set of random models and discarding the poorest scorers. CoMo carries out a six-dimensional search (both rotational and translational parameters), which can be successful where more traditional methods have failed.

Density modification

The interpretability of an initial map can be improved by both solvent flattening and non-crystallographic symmetry averaging, as well as a range of less powerful techniques. CCP4 provides DM and Solomon (which has the novel 'solvent-flipping' algorithm) for this. T. Terwilliger's Resolve (http://resolve.lanl.gov/) also carries out these functions with slightly different methodology. Finally the Uppsala Software Factory (http://xray.bmc.uu.se/~gerard/manuals/) provides a number of programs to carry out these operations.

Automated building

Arp/Warp (http://www.embl-hamburg.de/ARP/) is perhaps the most famous automated building package, and the latest version (6.0), although downloaded separately, can be started from within ccp4i. Arp/warp requires resolutions in excess of 2.0Å to succeed. Those of us less fortunate can use Resolve to generate and initial chain trace, or FFFear, which uses a real space search to place stretches of idealised helix and strand in an electron density map. Those blessed with atomic resolution data will want to try the direct method structure solution available in Acorn in CCP4.

Model building

No matter how good the data, we all have to compare the model and the map sooner or later. Two programs are used for this within this department. Alwyn Jones’ O (http://www.imsb.au.dk/~mok/o/) was for many years the only widely used protein crystallography building program. It is still extremely good for model building, especially from scratch. However, Duncan McRae's XtalView (http://www.scripps.edu/pub/dem-web/toc.html) is being increasingly used and is particularly good when making alterations to an existing model.

Refinement

Once a model has been built, Birkbeck supports a number of programs for refining it. CNS (http://cns.csb.yale.edu/v1.1/), with A. Brunger's simulated annealing protocols is perhaps most useful at medium to low resolution. Refmac 5, from the CCP4 suite is very efficient at slightly higher resolution, and finally where the data approach atomic resolution, then G. Sheldrick's SHELXL offers the greatest flexibility.

Model validation

It is easy to forget tools for model validation, but trying to ensure our model is as accurate and unbiased as possible is an important part of the structure solution process. Under the CCP4 suite, ProCheck and SFCheck provide a number of useful indicators.

The above is a very selective list from the vast area of protein crystallography programs available, and is biased towards those that are free of charge. However, installation of at least one program from each of the above categories should allow the user to solve their structure, without too many scenes like those in figure 2!