Bond strength between receptor binding domain of spike protein and human angiotensin converting enzyme-2 using machine learning
DOI:
https://doi.org/10.37155/2972-449X-vol2(1)-110Keywords:
Machine learning, Spike protein, RBD-ACE2 interface, Interatomic bonding, ab initio calculations, XGBoost, Decision Trees, Linear regressionAbstract
The spike protein (S-protein) of SARS-CoV-2 plays an important role in binding, fusion, and host entry. In this study, we have predicted interatomic bond strength between receptor binding domain (RBD) and angiotensin converting enzyme-2 (ACE2) using machine learning (ML), that matches with expensive ab initio calculation result. We collected bond order result from ab initio calculations. We selected a total of 18 variables such as bond type, bond length, elements and their coordinates, and others, to train ML models. We then trained five well-known regression models, namely, Decision Tree regression, KNN Regression, XGBoost, Lasso Regression, and Ridge Regression. We tested these models on two different datasets, namely, Wild type (WT) and Omicron variant (OV). In the first setting, we used 90% of each dataset for training and 10% for testing to predict the bond order. XGBoost model outperformed all the other models in the prediction of the WT dataset. It achieved an R2 Score of 0.997. XGBoost also outperformed all the other models with an R2 score of 0.9998 in the prediction of the OV dataset. In the second setting, we trained all the models on the WT (or OV) dataset and predicted the bond order on the OV (or WT) dataset. Interestingly, Decision Tree outperformed all the other models in both cases. It achieved an R2 score of 0.997.
References
Rambaut, A., et al. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. 2020; Available from: https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563 (accessed on 20 January, 2022).
Tegally, H., et al., Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa. MedRxiv, 2020, 1-19, https://doi.org/10.1101/2020.12.21.20248640.
Singh, J., S.A. Rahman, N.Z. Ehtesham, S. Hira, and S.E. Hasnain, SARS-CoV-2 variants of concern are emerging in India. Nature medicine, 2021, 27, 1-3, https://doi.org/10.1038/s41591-021-01397-4.
Faria, N.R., et al., Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings. Virological, 2021, 372, 815-821.
Kupferschmidt, K., New mutations raise specter of ‘immune escape’. 2021, 371(6527), 329-330, https://doi.org/10.1126/science.371.6527.329.
Ozer, E.A., L.M. Simons, O.M. Adewumi, A.A. Fowotade, E.C. Omoruyi, J.A. Adeniji, T.J. Dean, B.O. Taiwo, J.F. Hultquist, and R. Lorenzo-Redondo, High prevalence of SARS-CoV-2 B. 1.1. 7 (UK variant) and the novel B. 1.5. 2.5 lineage in Oyo State, Nigeria. medRxiv, 2021, 1-32, https://doi.org/10.1101/2021.04.09.21255206.
Annavajhala, M.K., H. Mohri, J.E. Zucker, Z. Sheng, P. Wang, A. Gomez-Simmonds, D.D. Ho, and A.-C. Uhlemann, A novel SARS-CoV-2 variant of concern, B. 1.526, identified in New York. medRxiv, 2021, 1-28, https://doi.org/10.1101/2021.02.23.21252259.
Liu, C., H.M. Ginn, W. Dejnirattisai, P. Supasa, B. Wang, A. Tuekprakhon, R. Nutalai, D. Zhou, A.J. Mentzer, and Y. Zhao, Reduced neutralization of SARS-CoV-2 B. 1.617 by vaccine and convalescent serum. Cell, 2021, 184(16), 4220-4236. e13, https://doi.org/10.1016/j.cell.2021.06.020.
Kimura, I., et al., SARS-CoV-2 Lambda variant exhibits higher infectivity and immune resistance. bioRxiv, 2021, 1-31, https://doi.org/10.1101/2021.07.28.454085.
Laiton-Donato, K., et al., Characterization of the emerging B. 1.621 variant of interest of SARS-CoV-2. medRxiv, 2021, 1-17, https://doi.org/10.1101/2021.05.08.21256619.
Tai, W., L. He, X. Zhang, J. Pu, D. Voronin, S. Jiang, Y. Zhou, and L. Du, Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cellular & molecular immunology, 2020, 17(6), 613-620, https://doi.org/10.1038/s41423-020-0400-4.
Hanson, Q.M., K.M. Wilson, M. Shen, Z. Itkin, R.T. Eastman, P. Shinn, and M.D. Hall, Targeting ACE2–RBD interaction as a platform for COVID-19 therapeutics: Development and drug-repurposing screen of an AlphaLISA proximity assay. ACS Pharmacology & Translational Science, 2020, 3(6), 1352-1360, https://doi.org/10.1021/acsptsci.0c00161.
Lan, J., J. Ge, J. Yu, S. Shan, H. Zhou, S. Fan, Q. Zhang, X. Shi, Q. Wang, and L. Zhang, Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. nature, 2020, 581(7807), 215-220, https://doi.org/10.1038/s41586-020-2180-5.
Adhikari, P., B. Jawad, R. Podgornik, and W.-Y. Ching, Mutations of Omicron variant at the interface of the receptor domain motif and human angiotensin-converting enzyme-2. International journal of molecular sciences, 2022, 23(5), 2870, https://doi.org/10.3390/ijms23052870.
Jawad, B., P. Adhikari, R. Podgornik, and W.-Y. Ching, Binding interactions between receptor-binding domain of spike protein and human angiotensin converting enzyme-2 in omicron variant. The journal of physical chemistry letters, 2022, 13(17), 3915-3921, https://doi.org/10.1021/acs.jpclett.2c00423.
Jawad, B., P. Adhikari, R. Podgornik, and W.-Y. Ching, Key interacting residues between RBD of SARS-CoV-2 and ACE2 receptor: Combination of molecular dynamic simulation and density functional calculation. Journal of Chemical Information and Modeling, 2021, https://doi.org/10.1021/acs.jcim.1c00560.
Chen, C., V.S. Boorla, D. Banerjee, R. Chowdhury, V.S. Cavener, R.H. Nissly, A. Gontu, N.R. Boyle, K. Vandegrift, and M.S. Nair, Computational prediction of the effect of amino acid changes on the binding affinity between SARS-CoV-2 spike RBD and human ACE2. Proceedings of the National Academy of Sciences, 2021, 118(42), e2106480118, https://doi.org/10.1073/pnas.2106480118.
Ching, W.-Y., P. Adhikari, B. Jawad, and R. Podgornik, Towards Quantum-Chemical Level Calculations of SARS-CoV-2 Spike Protein Variants of Concern by First Principles Density Functional Theory. Biomedicines, 2023, 11(2), 517, https://doi.org/10.3390/biomedicines11020517.
Jawad, B., P. Adhikari, R. Podgornik, and W.-Y. Ching, Impact of BA.1, BA.2, and BA.4/BA.5 Omicron Mutations on Therapeutic Monoclonal Antibodies. Submitted to CIBM, 2023.
Srivastava, N., P. Garg, P. Srivastava, and P.K. Seth, A molecular dynamics simulation study of the ACE2 receptor with screened natural inhibitors to identify novel drug candidate against COVID-19. PeerJ, 2021, 9, e11171, https://doi.org/10.7717/peerj.11171.
Celik, I., A. Khan, F.M. Dwivany, Fatimawali, D.-Q. Wei, and T.E. Tallei, Computational prediction of the effect of mutations in the receptor-binding domain on the interaction between SARS-CoV-2 and human ACE2. Molecular Diversity, 2022, 26(6), 3309-3324, https://doi.org/10.1007/s11030-022-10392-x.
Pattern recognition and machine learning, ed. C.M. Bishop. 2006, New York: Springer New York. 645-678.
Wang, H., C. Ma, and L. Zhou. A brief review of machine learning and its application. in 2009 international conference on information engineering and computer science. 2009. IEEE.
Hansen, K., F. Biegler, R. Ramakrishnan, W. Pronobis, O.A. Von Lilienfeld, K.-R. Muller, and A. Tkatchenko, Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space. The journal of physical chemistry letters, 2015, 6(12), 2326-2331, https://doi.org/10.1021/acs.jpclett.5b00831.
Du, X.-K., P. Guo, X.-H. Wu, and S.-Q. Zhang, Examination of machine learning for assessing physical effects: Learning the relativistic continuum mass table with kernel ridge regression. Chinese Physics C, 2023, 47(7), 074108, https://doi.org/10.1088/1674-1137/acc791.
Adhikari, P., B. Jawad, P. Rao, R. Podgornik, and W.-Y. Ching, Delta variant with P681R critical mutation revealed by ultra-large atomic-scale ab initio simulation: Implications for the fundamentals of biomolecular interactions. Viruses, 2022, 14(3), 465, https://doi.org/10.3390/v14030465.
Han, P., L. Li, S. Liu, Q. Wang, D. Zhang, Z. Xu, P. Han, X. Li, Q. Peng, and C. Su, Receptor binding and complex structures of human ACE2 to spike RBD from omicron and delta SARS-CoV-2. Cell, 2022, 185(4), 630-640. e10, https://doi.org/10.1016/j.cell.2022.01.001.
Pearlman, D.A., D.A. Case, J.W. Caldwell, W.S. Ross, T.E. Cheatham III, S. DeBolt, D. Ferguson, G. Seibel, and P. Kollman, AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Computer Physics Communications, 1995, 91(1-3), 1-41, https://doi.org/10.1016/0010-4655(95)00041-D.
VASP - Vienna Ab initio Simulation Package. Available from: https://www.vasp.at/ (accessed on June 1, 2023).
Ching, W.-Y. and P. Rulis, Electronic Structure Methods for Complex Materials: The orthogonalized linear combination of atomic orbitals. 2012, UK, London: OUP Oxford.
Perdew, J.P., K. Burke, and M. Ernzerhof, Generalized gradient approximation made simple. Physical review letters, 1996, 77(18), 3865, https://doi.org/10.1103/PhysRevLett.77.3865.
Kresse, G. and D. Joubert, From ultrasoft pseudopotentials to the projector augmented-wave method. Physical review b, 1999, 59(3), 1758, https://doi.org/10.1103/PhysRevB.59.1758.
Adhikari, P. and W.-Y. Ching, Amino acid interacting network in the receptor-binding domain of SARS-CoV-2 spike protein. RSC Advances 2020, 10, 39831-39841, https://doi.org/10.1039/d0ra08222h.
Adhikari, P., N. Li, M. Shin, N.F. Steinmetz, R. Twarock, R. Podgornik, and W.-Y. Ching, Intra-and intermolecular atomic-scale interactions in the receptor binding domain of SARS-CoV-2 spike protein: implication for ACE2 receptor binding. Physical Chemistry Chemical Physics, 2020, 22(33), 18272-18283, https://doi.org/10.1039/D0CP03145C.
Wang, R., L. Wang, J. Zhang, M. He, and J. Xu, XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate-to-severe traumatic brain injury. World Neurosurgery, 2022, 163, e617-e622, https://doi.org/10.1016/j.wneu.2022.04.044.
Tibshirani, R., Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996, 58(1), 267-288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Hoerl, A.E. and R.W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970, 12(1), 55-67, https://doi.org/10.1080/00401706.1970.10488634.
Abu-Mostafa, Y.S., M. Magdon-Ismail, and H.-T. Lin, Learning from data. Vol. 4. 2012, New York: AMLBook
Refaeilzadeh, P., L. Tang, and H. Liu, Cross-validation. Encyclopedia of database systems, 2009, 532-538, https://doi.org/10.1007/978-0-387-39940-9_565.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg, Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 2011, 12, 2825-2830.
Harris, C.R., K.J. Millman, S.J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, and N.J. Smith, Array programming with NumPy. Nature, 2020, 585(7825), 357-362, https://doi.org/10.1038/s41586-020-2649-2.
Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
Roberts, M., D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, A.I. Aviles-Rivero, C. Etmann, C. McCague, and L. Beer, Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence, 2021, 3(3), 199-217, https://doi.org/10.1038/s42256-021-00307-0.
Huang, F., L. Chen, W. Guo, X. Zhou, K. Feng, T. Huang, and Y. Cai, Identifying COVID-19 severity-related SARS-CoV-2 mutation using a machine learning method. Life, 2022, 12(6), 806, https://doi.org/10.3390/life12060806.
Burukanli, M. and N. Yumuşak, COVID-19 virus mutation prediction with LSTM and attention mechanisms. The Computer Journal, 2024, bxae058, https://doi.org/10.1093/comjnl/bxae058.
Han, J., T. Liu, X. Zhang, Y. Yang, Y. Shi, J. Li, M. Ma, W. Zhu, L. Gong, and Z. Xu, D3AI-Spike: A deep learning platform for predicting binding affinity between SARS-CoV-2 spike receptor binding domain with multiple amino acid mutations and human angiotensin-converting enzyme 2. Computers in Biology and Medicine, 2022, 151, 106212, https://doi.org/10.1016/j.compbiomed.2022.106212.
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Abdulmateen Adebiyi, Puja Adhikari, Praveen Rao, Wai-Yim Ching
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on any open-access article in a journal published by Globasci Publishing House Pte. Ltd. is retained by the authors. Authors grant Globasci Publishing House Pte. Ltd. a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its integrity is maintained and its original authors, citation details and publisher are identified. The Creative Commons Attribution-NonCommercial 4.0 International License formalizes these and other terms and conditions of publishing articles.