Structural biology of SARS-CoV-2 proteins

Andrzej JoachimiakKarolina Michalska
[CSGID]
SARS-CoV-2 genome coverage by CSGID structures.

Background

At the end of 2019 in the Wuhan province of China, a new virus emerged and spread rapidly, infecting thousands. The Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) responsible for Coronavirus Disease 2019 (COVID-19) is rapidly propagating around the world. The virus spreads very efficiently and sustainably between people in close contact, and it is now found in over 180 countries and on all continents, and there seems to be no stop to its dispersal with new centers emerging in different parts the world. There is no existing vaccine or proven drug treatment to prevent infections and stop virus proliferation. Detailed information about SARS-CoV-2 proteins, their functions and structures, is urgently needed to support the development of effective measures. A number of research groups around the world have initiated studies to determine structures of key proteins from the new virus and bring them into the public domain. The advances have been much faster than in the past, and they can lead the way to new treatments.

For over 100 years, since the Spanish flu pandemics, humanity has not experienced a major, global infectious disease. But a lot has changed since then. The human population grows and infringes on the environment, and people travel globally; the CDC estimates every year approximately one billion people cross borders, and pathogens move with them. In the 21st century, the world has experienced a number of viral outbreaks that threatened large human populations. These include Severe Acute Respiratory Syndrome [SARS 2002–2003 (Luk et al., 2019)], Middle East Respiratory Syndrome [MERS 2015 (Memish et al., 2020)], Ebola Virus Disease [EVD 2013–2014 (Cenciarelli et al., 2015)] and Zika Virus Disease [ZVD 2016 (Nugent et al., 2017)] to mention a recent few. Only for some of these diseases do effective antibodies and dedicated small-molecule drugs exist. However, these outbreaks were contained well before they spread worldwide. This has not been the case with COVID-19.

SARS-CoV-2

The emergence of SARS-CoV-2, an etiologic agent responsible for the current outbreak of COVID-19, is different. At the end of 2019, there were increased numbers of pneumonia reports in central China which rapidly spread to other provinces. The agent associated with COVID-19 was quickly identified as another coronavirus, termed SARS-CoV-2, likely of zoonotic origin. In the five months since it was discovered, the disease has dispersed uncontrollably, often through large superspreading events, causing a pandemic. Governments and health organizations put in place various strategies and countermeasures that work with varying degrees of success. The challenges are enormous, millions of people are being quarantined and the epidemic is impacting the world economy. It is clear that the current approach to develop effective drugs for emerging diseases is not working very well and we need a long-term strategy in place to deliver several treatment options against viruses and other pathogens.

Coronaviruses have been known for a long time; the first coronavirus of human origin, B814, was described in 1966 (Tyrrell & Bynoe, 1966). These viruses often have zoonotic origin and have been isolated from civets, camels, alpacas, rodents and pigs, with bats being recognized as a major reservoir species. The viruses can jump into humans and cause a variety of diseases with a range of outcomes, from a mild cold to pneumonia to deadly SARS. Such animal-to-people followed by human-to-human transmissions led to the SARS and MERS outbreaks, but they were well controlled at the time.

Similar to other coronaviruses, the genomic information of SARS-CoV-2 is stored in a single (+)sense RNA molecule. The genome codes for several Orfs encoding 15 non-structural (Nsps), 4 structural and 9–10 accessory proteins. The majority of the genome is dedicated to the replicase gene encompassed within large, slightly overlapping Orf1a and Orf1b. These Orfs are translated into two polyproteins, pp1a (4405 amino acids) and pp1ab (7096 amino acids) (Cui et al., 2019). The production of pp1ab requires a –1 ribosomal frameshift upstream of the Orf1a termination codon to extend translation of the pp1a. The resulting polypeptides are cleaved by two viral proteases to generate Nsps (Báez-Santos et al., 2015). The main protease Mpro, also referred to as 3C-like protease (3CLpro), has a chymotrypsin-like fold and is encoded by Nsp5. The other proteolytic enzyme is a papain-like protease (PLpro) and is encoded within Nsp3. Nsp3 is the largest protein produced by coronaviruses and it has several domains. In addition to PLpro, it contains an ubiquitin-like domain (Ubl1), acidic HVR domain, ADP-ribose-1-phosphatase (ADRP, also called Mac1, MacroD) and transmembrane region (TM). Viral Nsps, together with RNA, form a large membrane-bound assembly known as a replication and transcription complex. Although the functions of many Nsps have been directly associated with viral replication, the roles of some proteins, even with known enzymatic activities, remain poorly characterized. Understanding the structure, function and interactions of the RNA-synthesizing machinery of coronaviruses with host macromolecules is the key to rationalizing the development of improved control strategies.

[Figure3]Structure of SARS-CoV-2 PLpro protease determined at 1.8 Å resolution. The active site residues C111 and H272 are shown in red.

The worldwide spread of SARS-CoV-2 has raised questions as to how the virus achieved such high transmission rates. Unfortunately, despite many years of research, there is no approved vaccine, antibody or antiviral therapy for treatment of any of the SARS and MERS viruses.

In January 2020, the genome of SARS-CoV-2 became available to researchers around the world. The reaction of the scientific community was instantaneous. Within a few days, many molecular and structural biology laboratories redirected their efforts towards a single goal: producing proteins from SARS-CoV-2, characterizing their functions and solving protein structures to aid structure-based drug discovery. At the molecular level, the SARS-CoV-2 proteins are generally similar to their equivalents from the closely related SARS-CoV, and less like MERS-CoV proteins. However, the level of similarity varies significantly between proteins in SARS-CoV-2–SARS-CoV comparisons from 46% to 100% identity. These variances must be, at least in part, responsible for different host–pathogen interactions and epidemiological outcomes for these two closely related viruses (Wu et al., 2020). During the SARS 2003–2004 outbreak, major international effort was launched which was coordinated by the International Research Network on Structural Genomics of SARS-CoV (Canard et al., 2008). Within four years, 17 structures of SARS-CoV Nsps, structural and accessory proteins were determined in the US, Europe and China, covering 45% of the SARS-CoV proteome and 53% of its soluble proteins. These structures were determined by X-ray crystallography and NMR, and it became very clear that X-ray light sources made major contributions. This effort was highly relevant to attacking SARS-CoV-2, as it helped to decipher the SARS-CoV-2 ~30 kbs genome and design constructs for protein expression and structural studies. The Adam Godzik lab at UCR quickly created a resource that compares genome coverage for SARS and SARS-CoV-2. But how long would it take to generate the data urgently needed by the scientific community to find effective treatments against the global plague infecting millions by April 2020? It turned out that it did not take very long because of the past 15 years of technological advances. There were several important developments that have helped to speed up the science. Light sources, particularly the Advanced Photon Source (APS) in the US and the Diamond Light Source in the UK were very well set up for rapid-access single-crystal X-ray crystallography (Douangamath et al., 2020). X-ray crystallography has made tremendous progress in the past 20 years. Synthetic gene technology with sequences codon-optimized for expression of soluble proteins in E. coli was a commercially available resource. Many plasmid-based expression systems were available for expression and co-expression of interactive proteins. Rapid purification and crystallization protocols were established (Kim et al., 2011). The cryo-EM revolution provided a complementary technique to attack proteins and complexes that are difficult to crystallize (Walls et al., 2020).

And the race began; it will not be a sprint, but rather a marathon because of the significant challenges. A massive therapeutic effort has been put together towards vaccine development and repurposing FDA-approved drugs, with somewhat limited success. The blood from virus-infected survivors is being used to treat the most severe cases with some success, and a number of potential treatments are in initial trials. However, it is expected that the virus and its global reservoir will evolve, making some treatments obsolete and leading to the emergence of new outbreaks.

Research

Structural characterization of viral proteins is essential to develop vaccines, antibodies and small-molecule drugs. From the beginning, there were several important players that contributed to swift progress, and these contributors follow one of the most critical rules that all data are shared with the scientific community prior to publication, allowing science to accelerate. At the NIH/NIAID-funded Center for Structural Genomics of Infectious Diseases (CSGID), we redirected our efforts in January 2020 and focused on structural characterization of as many SARS-CoV-2 macromolecules as possible. The Center is a consortium of Northwestern University (Karla Satchell, CSGID co-director), the University of Chicago operating at Argonne National Laboratory (UChicago/ANL) (Andrzej Joachimiak), the School of Medicine at Washington University (Daved Fremont), the University of California Riverside School of Medicine (Adam Godzik), Purdue University (Andy Mesecar and Richard Kuhn), the University of Virginia (Wladek Minor), UT Southwestern Medical Center at Dallas (Zbyszek Otwinowski and Dominika Borek) and the University of Calgary (Alexei Savchenko), all of which contributed to the current efforts. Some members of the consortium had previous experience with SARS-CoV proteins. Our high-throughput protein production and structure-determination pipeline have allowed us to produce multiple samples for crystallographic and cryo-EM work and to support other scientific endeavors. Most of these involve multiple collaborations aimed at enhancing the impact of SARS-CoV-2 research by addressing small-molecule screening, vaccine and antibody development, and basic molecular biology of the coronavirus.

The first structure reported for CoV-2 was the main protease (Mpro) in complex with the inhibitor N3. The structure was determined by the Zihe Rao group in Beijing [Protein Data Bank (PDB) entry 6LU7, released on 6 February 2020] and later published in Nature (Jin et al., 2020). By the middle of February, the CSGID was well advanced in the purification and crystallization of viral proteins, and at the UChicago/ANL CSGID site, we determined the second CoV-2 structure. The Nsp15 endoribonuclease was deposited on 20 February 2020 (PDB entry 6W01, released on 4 March 2020) and published in Protein Science (Kim et al., 2020). In the following five months, the CSGID determined and made available to the scientific community 45 structures of 11 SARS-CoV-2 proteins (Table 1), including 35 determined in our laboratory at UChicago/ANL.

Table 1. List of CSGID structures from SARS-CoV-2

Nsp3 Papain-like protease (PLpro)

6W9C, 6WRH, 6WZU, 6XG3, 7JIR, 7JIT, 7JIV, 7JIW

Nsp3 ADRP/MacroD

6VXS, 6W02, 6W6Y, 6WEN, 6WCF

Nsp5 Main protease (Mpro, 3CLpro)

6W63, 6WPN, 6WQF, 6XG2, 6XKF, 6XKH, 6XOA, 7JFQ

Nsp7/Nsp8 Primase complex

6WIQ, 6WQD, 6WTC

Nsp9 RNA/DNA binding

6W4B

Nsp10/Nsp16 2’-O-MT2’-O-ribose methyltransferase complex

6W4H, 6W61, 6W75, 6WKQ, 6WJT, 6WVN, 3WQ3

Nsp15, EndoU, U-specific endoribonuclease

6W01, 6VWW, 6WLC, 6WXC, 6X1B, 6X41

Orf7a

6W37, 6XKM, 7JHE, 7JIB

N-protein RNA binding domain

6VYO, 6WKP

N-protein C-terminal domain

6WJI

The important drug targets include high-resolution structures of PLpro and Mpro proteases participating in the polyprotein processing. The three PLpro structures from the UChicago/ANL site correspond to the WT enzyme in two different crystal forms and the active-site cysteine mutant (C111S) determined at 1.6 Å resolution. The additional room-temperature (RT) structure of PLpro is in refinement. Two Mpro structures from the Mesecar laboratory in Purdue include complexes with inhibitors (PDB entries 6W63, 6WPN). The third structure of Mpro was determined at RT (PDB entry 6WQF) in collaboration with Andrey Kovalevsky's team at ORNL to advance computational modelling (the publication has been accepted by Nature Communications). Rolf Hilgenfeld at the University of Lübeck, Germany, who has studied SARS CoV proteins for years, provided important high-resolution structures of Mpro complexes with potential inhibitors and developed the lead compound of the main protease (Zhang et al., 2020).

[Figure2]Structure of SARS-CoV-2 main protease Mpro dimer determined at room temperature. The active site residues H41 and C145 are shown in red.

At UChicago/ANL, structures of several other SARS-CoV-2 proteins have been determined as well, with an average resolution of 1.95 Å. Multiple structures were determined for Nsp15 endoribonuclease, assumed to play a role in the evasion of host immunity. These include complexes with nucleotides and Tipiracil, an inhibitor from the list of FDA-approved drugs (publication in preparation). Five structures of Nsp3 domain ADRP/MacroD have been determined, including an apo form and in the presence of adenosine 5’-diphosphate ribose (ADPr) and AMP, as well as an atomic-resolution structure of the complex with MES buffer (PDB entry 6WCF) determined at 1.07 Å (Michalska et al., 2020), providing detailed information about the binding site and interaction with the ligand. The function of this domain is to bind ADP-ribose or poly(ADP-ribose) and remove ADPr attached to human proteins or RNA in response to viral infection and thus may be involved in de-MARylation and de-PARylation (Li et al., 2016; Eckei et al., 2017; Munnur et al., 2019). There are also three structures of the Nsp7/Nsp8 primase complex determined which show interactions and flexibility of Nsp8. The structure of Nsp9 shows a tight dimer (PDB entry 6W4B). Nsp9 from SARS and MERS was shown to interact with RNA and DNA. Nsp9 belongs to a smaller member of the replicase complex which is a crucial cofactor contributing to the emerging “Nsp interactome”.

We have also determined structures of the N nucleocapsid protein and N-terminal RNA-binding domain (PDB entries 6VYO, 6WKP) at UChicago/ANL, and the C-terminal domain at Northwestern University laboratory (PDB entry 6WJI) (publication in preparation). The N nucleocapsid protein covers the entire virus genomic RNA in the virion but may also play other functions as it seems to interact with Nsp3. Comparison of the monomer with other homologs shows quite large conformational differences, suggesting that, despite high-sequence identity, this domain has built-in conformational flexibility required for binding RNA. Seven structures of Nsp10/Nsp16 methyltransferase complex have been determined at Northwestern University and UChicago/ANL. These structures describe the Nsp10/Nsp16 assembly crystallized under different conditions, as well as its complexes with S-adenosylmethionine and 7-methyl-GpppA dinucleotide (Rosas-Lemus et al., 2020). We also determined one structure for an accessory protein. The Daved Fremont laboratory solved the structure of Orf7a (PDB entry 6W37). The exact function of this small protein, as for the majority of accessory proteins, is unknown.

The APS light source has become a central facility in the US for SARS-CoV-2 projects. Overall, the CSGID and other users of the APS have deposited 45 crystal structures for 11 different proteins. Many structures involve interactions with ligands and inhibitors. The CSGID structures are released to the scientific community prior to publication, and materials are available at the NIH/NIAID BEI resource.

Our approach has taken us in a different direction (though very complementary) to the efforts at the Diamond Light Source in the UK. There the focus has been on determining many structures of one essential enzyme using a large-scale screen of electrophile and non-covalent fragments through a combined mass spectrometry and X-ray approach. The teams of Frank von Delft and Martin Walsh have been very successful in delivering over 110 structures of Mpro with small ligands (Douangamath et al., 2020). They have identified 71 hits that span the active site and 3 hits at the dimer interface. These structures reveal potential routes for rapid development of more potent inhibitors through merging covalent and non-covalent fragment hits. We believe that this approach can provide useful structural and reactivity information for structure-based drug discovery and can serve as starting points for antivirals targeting specific enzymes. Therefore, we have established collaboration with both teams at Diamond on screening Nsp15 and PLpro targets.

However, one crucial protein of SARS-CoV-2 thus far has eluded crystallization efforts, RNA-dependent RNA polymerase Nsp12. This is the target of one of the most promising drug candidates: remdesivir. Here, cryo-EM has become the key tool for structural information (Hillen et al., 2020; Yin et al., 2020). Additional cryo-EM structures reported for full-length human ACE2, in the presence of a neutral amino acid transporter B0AT1, with or without the receptor-binding domain (RBD) of the surface spike glycoprotein (S protein) of SARS-CoV-2 [PDB entry 6M17 (Yan et al., 2020)] and for the spike ectodomain structure [PDB entries 6VYB, 6VXX (Walls et al., 2020)], as well as structures of RBD with the SARS-CoV specific antibodies (Tian et al., 2020; Wrapp et al., 2020). Some complexes with antibodies have been determined by X-ray crystallography [PDB entry 6W41 (Yuan et al., 2020)] or a combination of cryo-EM and X-ray crystallography.

Outlook

Most importantly, X-ray crystallography has done exceptionally well in this important challenge and has rapidly contributed atomic-resolution data essential to the structure-based development of treatments for COVID-19. Since February 2020, high-resolution X-ray structures have been deposited in addition to 18 cryo-EM structures and 1 NMR structure. Our efforts have provided high-resolution insight into the SARS-CoV-2 virus which can be utilized by other researchers. For example, these structures have been used by the Rick Stevens and Arvind Ramanathan groups at the Argonne National Laboratory and others for in silico drug discovery using advanced computational approaches such as DeepDriveMD.

In parallel with experimental work, a number of resources have been developed for COVID-19 that combine sequence (https://github.com/Knowledge-Graph-Hub/kg-COVID-19/wiki) and structural-quality evaluation data (https://COVID-19.bioreproducibility.org/). These sites provide important information about the genomes, connectivities, interactions and the structure-quality assessment. This work is ongoing and we hope for significant progress soon.

Acknowledgements

Funding for this project was provided in part by Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN272201700060C and by the U.S. Department of Energy (DOE) Office of Science and operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.

References

Báez-Santos, Y. M., John, S. E. & Mesecar, A. D. (2015). Antiviral Res. 115, 21–38.

Canard, B., Joseph, J. S. & Kuhn, P. (2008). Antiviral Res. 78, 47–50.

Cenciarelli, O., Pietropaoli, S., Malizia, A., Carestia, M., D'Amico, F., Sassolini, A., Di Giovanni, D., Rea, S., Gabbarini, V., Tamburrini, A., Palombi, L., Bellecci, C. & Gaudio, P. (2015). Int. J. Microbiol. 2015, 769121.

Cui, J., Li, F. & Shi, Z. L. (2019). Nat. Rev. Microbiol. 17, 181–192.

Douangamath, A., Fearon, D., Gehrtz, P., Krojer, T., Lukacik, P., Owen, C. D., Resnick, E., Strain-Damerell, C., Aimon, A., Ábrányi-Balogh, P., Brandaõ-Neto, J., Carbery, A., Davison, G., Dias, A., Downes, T. D., Dunnett, L., Fairhead, M., Firth, J. D., Jones, S. P., Keely, A., Keserü, G. M., Klein, H. F., Martin, M. P., Noble, M. E. M., O'Brien, P., Powell, A., Reddi, R., Skyner, R., Snee, M., Waring, M. J., Wild, C., London, N., von Delft, F. & Walsh, M. A. (2020). bioRxiv: https://dx.doi.org/10.1101/2020.05.27.118117.

Eckei, L., Krieg, S., Bütepage, M., Lehmann, A., Gross, A., Lippok, B., Grimm, A. R., Kümmerer, B. M., Rossetti, G., Lüscher, B. & Verheugd, P. (2017). Sci. Rep. 7, 41746.

Hillen, H. S., Kokic, G., Farnung, L., Dienemann, C., Tegunov, D. & Cramer, P. (2020). Nature: https://doi.org/10.1038/s41586-020-2368-8

Jin, Z., Du, X., Xu, Y., Deng, Y., Liu, M., Zhao, Y., Zhang, B., Li, X., Zhang, L., Peng, C., Duan, Y., Yu, J., Wang, L., Yang, K., Liu, F., Jiang, R., Yang, X., You, T., Liu, X., Yang, X., Bai, F., Liu, H., Liu, X., Guddat, L. W., Xu, W., Xiao, G., Qin, C., Shi, Z., Jiang, H., Rao, Z. & Yang, H. (2020). Nature, 582, 289–293.

Kim, Y., Babnigg, G., Jedrzejczak, R., Eschenfeldt, W. H., Li, H., Maltseva, N., Hatzos-Skintges, C., Gu, M., Makowska-Grzyska, M., Wu, R., An, H., Chhor, G. & Joachimiak, A. (2011). Methods, 55, 12–28.

Kim, Y., Jedrzejczak, R., Maltseva, N. I., Wilamowski, M., Endres, M., Godzik, A., Michalska, K. & Joachimiak, A. (2020). Protein Sci. 29, 1596–1605.

Li, C., Debing, Y., Jankevicius, G., Neyts, J., Ahel, I., Coutard, B. & Canard, B. (2016). J. Virol. 90, 8478–8486.

Luk, H. K. H., Li, X., Fung, J., Lau, S. K. P. & Woo, P. C. Y. (2019). Infect. Genet. Evol. 71, 21–30.

Memish, Z. A., Perlman, S., Van Kerkhove, M. D. & Zumla, A. (2020). Lancet, 395, 1063–1077.

Michalska, K., Kim, Y., Jedrzejczak, R., Maltseva, N. I., Stols, L., Endres, M. & Joachimiak, A. (2020). IUCrJ, 7: https://doi.org/10.1107/S2052252520009653

Munnur, D., Bartlett, E., Mikolčević, P., Kirby, I. T., Matthias Rack, J. G., Mikoč, A., Cohen, M. S. & Ahel, I. (2019). Nucleic Acids Res. 47, 5658–5669.

Nugent, E. K., Nugent, A. K., Nugent, R. & Nugent, K. (2017). Am. J. Med. Sci. 353, 466–473.

Rosas-Lemus, M., Minasov, G., Shuvalova, L., Inniss, N. L., Kiryukhina, O., Wiersum, G., Kim, Y., Jedrzejczak, R., Maltseva, N. I., Endres, M., Jaroszewski, L., Godzik, A., Joachimiak, A. & Satchell, K. J. F. (2020). bioRxiv, 2020.2004.2017.047498.

Tian, X., Li, C., Huang, A., Xia, S., Lu, S., Shi, Z., Lu, L., Jiang, S., Yang, Z., Wu, Y. & Ying, T. (2020). Emerg. Microbes Infect. 9, 382–385.

Tyrrell, D. A. & Bynoe, M. L. (1966). Lancet, 1, 76–77.

Walls, A. C., Park, Y. J., Tortorici, M. A., Wall, A., McGuire, A. T. & Veesler, D. (2020). Cell, 181, 281–292 e286.

Wrapp, D., Wang, N., Corbett, K. S., Goldsmith, J. A., Hsieh, C. L., Abiona, O., Graham, B. S. & McLellan, J. S. (2020). Science, 367, 1260–1263.

Wu, F., Zhao, S., Yu, B., Chen, Y. M., Wang, W., Song, Z. G., Hu, Y., Tao, Z. W., Tian, J. H., Pei, Y. Y., Yuan, M. L., Zhang, Y. L., Dai, F. H., Liu, Y., Wang, Q. M., Zheng, J. J., Xu, L., Holmes, E. C. & Zhang, Y. Z. (2020). Nature, 580, E7.

Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y. & Zhou, Q. (2020). Science, 367, 1444–1448.

Yin, W., Mao, C., Luan, X., Shen, D. D., Shen, Q., Su, H., Wang, X., Zhou, F., Zhao, W., Gao, M., Chang, S., Xie, Y. C., Tian, G., Jiang, H. W., Tao, S. C., Shen, J., Jiang, Y., Jiang, H., Xu, Y., Zhang, S., Zhang, Y. & Xu, H. E. (2020). Science, 368, 1499–1504.

Yuan, M., Wu, N. C., Zhu, X., Lee, C. D., So, R. T. Y., Lv, H., Mok, C. K. P. & Wilson, I. A. (2020). Science, 368, 630–633.

Zhang, L., Lin, D., Sun, X., Curth, U., Drosten, C., Sauerhering, L., Becker, S., Rox, K. & Hilgenfeld, R. (2020). Science, 368, 409–412.

 

Andrzej Joachimiak and Karolina Michalska are at the Center for Structural Genomics of Infectious Diseases, Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL 60667, USA, and the Structural Biology Center, X-ray Science Division, Argonne National Laboratory, Argonne, IL 60439, USA. Dr Joachimiak is also at the Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, 60367, USA; correspondence should be addressed to him at andrzejj@anl.gov.

26 June 2020

Copyright © - All Rights Reserved - International Union of Crystallography

The permanent URL for this article is https://www.iucr.org/news/newsletter/volume-28/number-2/structural-biology-of-sars-cov-2-proteins