Feature article
Protein design and folding prediction endorsed by the Nobel Prize for Chemistry 2024
I have written before in the IUCr Newsletter [1] on the breakthrough made in protein folding prediction by Google DeepMind's AlphaFold [2] and on the PDBe (Protein Data Bank Europe) facilitating availability for users of some 200 million AlphaFold predicted proteins [3]. Protein folding prediction was one of the decades-old grand challenges of science. Since my student days, I have been fascinated by the physics-based hunt for the Gibbs free energy minimum, which must be the protein 3D structure identified by Anfinsen's protein denaturation and renaturation studies [4]. However, it was not energy minimum methods that solved it; it was a combination of learning from protein sequences and precise protein experimental structures carefully curated in databases [5]. These AI and ML (artificial intelligence and machine learning) methods based on big data are applied to many different prediction challenges in science and, more widely, in society. The continued wish for a physics-based energy minimum approach has led to further debate on whether the protein folding challenge has been solved [6, 7].
On October 9th, 2024, the Nobel Prize for Chemistry was awarded [8]:
"David Baker has succeeded with the almost impossible feat of building entirely new kinds of proteins. Demis Hassabis and John Jumper have developed an AI model to solve a 50-year-old problem: predicting proteins' complex structures. These discoveries hold enormous potential."
Protein design is a reverse procedure to protein fold prediction, which involves designing a protein structure and determining the amino acid sequence that would produce that designed protein. This is a fantastic breakthrough by David Baker and his coworkers at the University of Washington in Seattle. The first of these was published in 2003 [9], see Fig. 1. As remarked by the authors of [9]:
“The ability to design a new protein fold makes possible the exploration of the large regions of the protein universe not yet observed in nature.”
They also recognised the potential for a sweeping change in protein crystallography phase determination:
“Remarkably, a strong molecular replacement (MR) solution to the phase problem was found with the use of the design model. (So) the design model was quite close to the true structure: even the small deviations of NMR solution structures from X-ray crystal structures can make molecular replacement searches fail."
Why did I not write about it [9] these past 20 years in the IUCr Newsletter? I certainly admired it, but I imagined it as mainly about molecular biophysics, at that time. Meanwhile, protein fold prediction I could always see, if fully solved, would have a massive impact on my efforts in protein crystallography phase determination using tuneable synchrotron radiation. This was a topic that Dorothy Hodgkin drew my attention to in 1975 when she received a pre-print from the SSRL (Stanford Synchrotron Radiation Laboratory) via Sir Ron Mason of the seminal work of Keith Hodgson and colleagues there [10]. Indeed, we are approaching the 50th anniversary of that momentous paper, and a special issue of articles in the Journal of Synchrotron Radiation is underway (https://journals.iucr.org/s/services/specialissues.html), endorsed, of course, by Keith Hodgson himself.
An interesting feature about this year's Nobel Prize in Chemistry was a seeming coordination of topics with the Physics Nobel Prize announced the day before to John J. Hopfield and Geoffrey E. Hinton [11]:
"...for foundational discoveries and inventions that enable machine learning with artificial neural networks."
The Nobel Prize in Physics led to comments on X (formerly known as Twitter) that the Prize had been hijacked by computer scientists. The Nobel Committee explanation, seemingly to anticipate such criticisms, had already emphasised that the Hopfield contribution was statistical physics and that Hinton's contribution involved a so-called Boltzmann machine. Interestingly, the applications of the Hopfield and Hinton methods emphasised in the main press conference were things like interpreting medical imaging data and particle physics data. It seemed to carefully exclude protein folding prediction. However, in a detailed explanation and interview of a Nobel Physics Committee specialist after the main announcement event, he cited application of their methodology in that domain.
So, what next for the two domains of protein prediction and protein design? Google DeepMind, which is based in London, recently launched its AlphaFold3 software for predicting biomolecular interactions, including proteins, nucleic acids, small molecules, ions and modified residues [12]. For the domain of protein design, the list of impressive applications described by David Baker in his telephone interview at the press conference already includes a designed protein suitable for incorporation in a nasal spray against the COVID-19 spike protein and a designed protein sensitive to environmental pollutants. As was remarked in the Nobel Chemistry Prize ceremony's subsequent discussions, the limit is not the possibilities but our imagination.
In my own tweet about it on X (formerly known as Twitter) (@helliwelljohn) I wrote:
"Hearty congratulations to all three Laureates. Both domains, protein design and prediction, are fantastic breakthroughs. Hearty thank yous too to the great care of the sequence data bases and of the PDB experiments' structures database. @rcsbPDB @PDBeurope @PDBj_en”
Where are we today in my view as a protein crystallographer then? Recently I was asked this by the ESRF communications office, who were writing a piece about the 30 years of user operation at the ESRF since it came online, since I had been involved in the ESRF since the early days. I replied that: "in the mid 1980s I led the ESRF MX (macromolecular crystallography) Working Group and wrote the MX sections for the ESRF Foundation Phase Report (Red Book) [13]. Since then, MX crystals have got ever smaller, tuneable wavelength resonant scattering phasing got supplanted by AlphaFold2 about four years ago, CryoMX (at 100K) is getting more and more supplanted by room temperature MX, structural dynamics (including time resolved MX) keeps growing in importance rather than structure determination per se, neutron MX is growing in importance and X-ray wavelength tuning for site specific metal or ion determination in MX remains very important. For the future, using the new ESRF Extremely Bright Source upgrade of two years ago, it will be dominated by the physiological relevant MX and structural dynamics (time resolved) MX studies. Of course, another major breakthrough is that crystallisation failures of large complexes now have a different way forward with cryoEM.”
Inherent in this personal assessment for the ESRF communications office is that predictions of protein folds have somewhat elaborated into predictions and designs of protein structures including complexes, not ‘only’ folds. Thereby, the emphasis for the experimental protein crystallographer is apparently shifting to determining structures closer to the conditions of the living cell itself, be it temperature or pH. For example, my colleagues and I recently published a body temperature (37°C) study of a protein with a bound rhenium theranostic compound seeking a truly physiologically relevant structure [14]. I also note that in an unusual move the EuroXFEL have promptly announced the day after the Nobel Prize that David Baker is one of their users, stimulating my thinking that he is moving into the design of protein structure and dynamics. Clearly, close fit predictions to the current predominantly 100K database entries of protein structures is one thing, but the need is with us for an expanded number of experimental structures at a wide range of physiologically relevant conditions if there are to be attempts at a close fit of predictions to the living cell situation. Of course, any protein 3D structure prediction or design has to be experimentally validated by crystallography, cryoEM or NMR spectroscopy.
References
[1] Helliwell, J. R. (2020). DeepMind and CASP14. IUCr Newsletter, 28 (4).
[8] The Royal Swedish Academy of Sciences (2024). The Nobel Prize in Chemistry 2024, press release, https://www.nobelprize.org/uploads/2024/10/press-chemistryprize2024-2.pdf.
[11] NobelPrize.org (2024). The Nobel Prize in Physics 2024, Nobel Prize Outreach AB 2024, https://www.nobelprize.org/prizes/physics/2024/summary/.
[13] ESRF Foundation Phase Report (1987). Grenoble, France.
John R. Helliwell, Department of Chemistry, University of Manchester, Manchester M13 9PL.
Copyright © - All Rights Reserved - International Union of Crystallography