Bookmark and Share

The rapid growth and extraordinary power of protein crystallography

John Helliwell

The last twenty years has seen a transformation of the scope of the methods of protein crystallography. The rate of deposition of protein structures in the Protein Data Bank more than doubles every two years. The molecular weight capability is now very large indeed, and not only for highly symmetric viruses, but other multi-macromolecular complexes like the F1ATPase, critical to the 1997 Nobel Prize for Chemistry, and ribosome analysis currently under way. Crystal exposure times have been reduced to the almost unthinkably short times set by synchrotron single bunches of tens of picoseconds in time length, thus expanding the potential of dynamical studies. There is a synergy also between synchrotron radiation and neutron diffraction, whereby the difficulties thought to exist in harnessing a broad spectral bandpass simultaneously have been largely overcome. The speed of data collection has thereby also been improved for neutrons. Hence, individual protein structures can be studied in multiple forms, either with time or via multiple binding of inhibitor and substrate analogues by both X-rays and neutrons. The accuracy of some protein synchrotron X-ray structures now equals that of smaller molecule structures, and fine details can be discerned without restraining bond distances. Single and double bond distances can be resolved and confirmed by the location of hydrogen atoms. Intense, tuneable, synchrotron X-ray beams, cryo-crystallography and sensitive area detectors have made this possible. Essentially all protein and virus crystals can be frozen although optimization of mosaicity remains under development. Coordinared detector initiatives, such as "IMPACT" in the UK and the pixel detector programs in the USA, give promise of further advances in X-ray detectors in the near term. Today's lab computers equal those available at a national resource 15 years ago. Molecular graphics capabilities continue to grow. Can future developments in prorein crystallography continue at this pace? Will advances in the next century dwarf all that have gone before? What are the major challenges and opportunities for crystallography as we reach the millennium? The pivotal role of the IUCr in bringing crystallography together to develop and maintain standards will be critically important. Its role in organizing publications, especially with the vantage point of our inter-disciplinary perspective, will be as strong as ever. This article, invited by Bill Duax, allows me to offer my views on the coming years.

High resolution structure determination and dynamical studies are the keys to the future of protein crystallography. X-rays, electrons and neutrons are our three probes of matter using diffraction. The universe of possible molecules is "infinitely" populated and includes the group of molecules that comprises our genome. After 10,000 human generations, ours has the opportunity to elucidate at the molecular level the 3-dimensional structure of all of the proteins comprising the human genome. Similarly, disease pathogens could now be understood in structural terms. The ability to control our circumstances, and not be their victim, has long been a noble aim of humanity. Is it really possible to embark on 'structural genomics'?

The human genome comprises some 100,000 proteins with some 40% being membrane bound, and therefore more difficult to crystallize. Nevertheless these proteins are to some degree amenable to study by X-ray and/or electron crystallography techniques. The molecular weight histogram for the yeast genome recently sequenced shows a peak at about 30,000 molecular weight. Such proteins are perfectly suited to seleno MAD phasing. Currently it takes 1 to 2 days of beam time to measure enough MAD data to solve a protein de novo. Stronger sources and faster detector readout times will readily reduce the data collection time. If a worldwide coordination could be brought about, e.g. on a chromosome by chromosome share between SR facilities, then in 30 years 20 instruments could yield 100,000 structures! That isn't even half way through the next century nor does it invoke further acceleration in scope of the method. At this rate it would be conceivable to do more than one person's genome or at least to examine the varying places between a 'standard' genome and others. An alternative approach would be to target the places in the human genome, where, based on the amino acid sequence, one should expect to find novel structures. This offers better commercial opportunity, but would be more difficult to coordinate, i.e. via the global SR facilities.

Crystallization is the greatest challenge to massive protein structure determination. Robotics permit systematic screening of solution conditions and survey of crystallization 'parameter space'. Fluid physics specialists are entering the field via space research where microgravity eliminated convection and sedimentation. New ideas have led to controlled convection and 'containerless' growth on earth. One goal of microgravity work is to understand why bigger, lower mosaicity and/or better resolution crystals are obtained with 20% of the proteins tested. In a recent small molecule crystal growth report, crystals grown in microgravity have greatly reduced disorder and diffuse scattering and Bragg peaks enhancement. Since many protein crystals exhibit diffuse scattering its reduction could enhance diffraction quality. Crystal perfection achievable today seemed inconceivable just ten years ago. However, the diffraction apparatus and detectors required to harness it are not currently available on a synchrotron beamline.

The determination of huge numbers of precise structures will be a challenge to databases, Journals, and scientists. In small molecule crystallography we have seen the explosion in numbers of structures produced today. The IUCr has coordinated the debate on the preservation of the quality of these structures via agreed criteria. It has set the 'gold-standard' adopted by most Journals and users of small molecule structural data via the crystallographic information file (CIF). The mmCIF for proteins is equally important. Rapid communication of protein structures to an agreed standard will be a vital challenge and opportunity for our field, and the IUCr, in the coming years. Moreover the ease with which multiple structural forms can be produced, e.g. via time-resolved or multiple binding studies or molecular dynamics simulations, will make databases of structures more routinely '4-dimensional' rather than '3-dimensional'.

John R. Helliwell, U. of Manchester, UK