De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning

Detalhes bibliográficos
Autor(a) principal: Santana, Marcos V. S.
Data de Publicação: 2021
Outros Autores: Silva Jr., Floriano P.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da FIOCRUZ (ARCA)
Texto Completo: https://www.arca.fiocruz.br/handle/icict/46077
Resumo: Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. LaBECFar - Laboratório de Bioquímica Experimental e Computacional de Fármacos. Rio de Janeiro, RJ, Brasil.
id CRUZ_990ea0f62a5e9dadb342f3a90c4327e7
oai_identifier_str oai:www.arca.fiocruz.br:icict/46077
network_acronym_str CRUZ
network_name_str Repositório Institucional da FIOCRUZ (ARCA)
repository_id_str 2135
spelling Santana, Marcos V. S.Silva Jr., Floriano P.2021-02-12T20:14:35Z2021-02-12T20:14:35Z2021SANTANA, Marcos V. S.; SILVA JR., Floriano P. De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning. BMC Chemistry, v. 15, n. 8, p. 1-20, 2021.2661-801Xhttps://www.arca.fiocruz.br/handle/icict/4607710.1186/s13065-021-00737-2engBMCCOVID-19SARS-CoV-2Aprendizado de transferênciaModelo generativoUlmfitCOVID-19UlmfitTransfer learningDe novo drug designGenerative modelDe novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learninginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Instituto Oswaldo Cruz. LaBECFar - Laboratório de Bioquímica Experimental e Computacional de Fármacos. Rio de Janeiro, RJ, Brasil.Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. LaBECFar - Laboratório de Bioquímica Experimental e Computacional de Fármacos. Rio de Janeiro, RJ, Brasil.The global pandemic of coronavirus disease (COVID-19) caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) created a rush to discover drug candidates. Despite the efforts, so far no vaccine or drug has been approved for treatment. Artificial intelligence offers solutions that could accelerate the discovery and optimization of new antivirals, especially in the current scenario dominated by the scarcity of compounds active against SARS-CoV-2. The main protease ( Mpro) of SARS-CoV-2 is an attractive target for drug discovery due to the absence in humans and the essential role in viral replication. In this work, we developed a deep learning platform for de novo design of putative inhibitors of SARS-CoV-2 main protease ( Mpro). Our methodology consists of 3 main steps: (1) training and validation of general chemistry-based generative model; (2) fine-tuning of the generative model for the chemical space of SARS-CoV- Mpro inhibitors and (3) training of a classifier for bioactivity prediction using transfer learning. The fine-tuned chemical model generated > 90% valid, diverse and novel (not present on the training set) structures. The generated molecules showed a good overlap with Mpro chemical space, displaying similar physicochemical properties and chemical structures. In addition, novel scaffolds were also generated, showing the potential to explore new chemical series. The classification model outperformed the baseline area under the precision-recall curve, showing it can be used for prediction. In addition, the model also outperformed the freely available model Chemprop on an external test set of fragments screened against SARS-CoV-2 Mpro, showing its potential to identify putative antivirals to tackle the COVID-19 pandemic. Finally, among the top-20 predicted hits, we identified nine hits via molecular docking displaying binding poses and interactions similar to experimentally validated inhibitors.info:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-82991https://www.arca.fiocruz.br/bitstream/icict/46077/1/license.txt5a560609d32a3863062d77ff32785d58MD51ORIGINALSantana_Marcos_etal_IOC_2021_COVID-19.pdfSantana_Marcos_etal_IOC_2021_COVID-19.pdfapplication/pdf5599951https://www.arca.fiocruz.br/bitstream/icict/46077/2/Santana_Marcos_etal_IOC_2021_COVID-19.pdfe746492b4e06d74f1d5dd36ef4c888eeMD52TEXTSantana_Marcos_etal_IOC_2021_COVID-19.pdf.txtSantana_Marcos_etal_IOC_2021_COVID-19.pdf.txtExtracted texttext/plain68271https://www.arca.fiocruz.br/bitstream/icict/46077/3/Santana_Marcos_etal_IOC_2021_COVID-19.pdf.txt890e7394d273a426d96b4eb33ddfa86aMD53icict/460772021-02-13 02:04:16.078oai:www.arca.fiocruz.br:icict/46077Q0VTU8ODTyBOw4NPIEVYQ0xVU0lWQSBERSBESVJFSVRPUyBBVVRPUkFJUwoKQW8gYWNlaXRhciBvcyBURVJNT1MgZSBDT05EScOHw5VFUyBkZXN0YSBDRVNTw4NPLCBvIEFVVE9SIGUvb3UgVElUVUxBUiBkZSBkaXJlaXRvcwphdXRvcmFpcyBzb2JyZSBhIE9CUkEgZGUgcXVlIHRyYXRhIGVzdGUgZG9jdW1lbnRvOgoKKDEpIENFREUgZSBUUkFOU0ZFUkUsIHRvdGFsIGUgZ3JhdHVpdGFtZW50ZSwgw6AgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaLCBlbQpjYXLDoXRlciBwZXJtYW5lbnRlLCBpcnJldm9nw6F2ZWwgZSBOw4NPIEVYQ0xVU0lWTywgdG9kb3Mgb3MgZGlyZWl0b3MgcGF0cmltb25pYWlzIE7Dg08KQ09NRVJDSUFJUyBkZSB1dGlsaXphw6fDo28gZGEgT0JSQSBhcnTDrXN0aWNhIGUvb3UgY2llbnTDrWZpY2EgaW5kaWNhZGEgYWNpbWEsIGluY2x1c2l2ZSBvcyBkaXJlaXRvcwpkZSB2b3ogZSBpbWFnZW0gdmluY3VsYWRvcyDDoCBPQlJBLCBkdXJhbnRlIHRvZG8gbyBwcmF6byBkZSBkdXJhw6fDo28gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBlbQpxdWFscXVlciBpZGlvbWEgZSBlbSB0b2RvcyBvcyBwYcOtc2VzOwoKKDIpIEFDRUlUQSBxdWUgYSBjZXNzw6NvIHRvdGFsIG7Do28gZXhjbHVzaXZhLCBwZXJtYW5lbnRlIGUgaXJyZXZvZ8OhdmVsIGRvcyBkaXJlaXRvcyBhdXRvcmFpcwpwYXRyaW1vbmlhaXMgbsOjbyBjb21lcmNpYWlzIGRlIHV0aWxpemHDp8OjbyBkZSBxdWUgdHJhdGEgZXN0ZSBkb2N1bWVudG8gaW5jbHVpLCBleGVtcGxpZmljYXRpdmFtZW50ZSwKb3MgZGlyZWl0b3MgZGUgZGlzcG9uaWJpbGl6YcOnw6NvIGUgY29tdW5pY2HDp8OjbyBww7pibGljYSBkYSBPQlJBLCBlbSBxdWFscXVlciBtZWlvIG91IHZlw61jdWxvLAppbmNsdXNpdmUgZW0gUmVwb3NpdMOzcmlvcyBEaWdpdGFpcywgYmVtIGNvbW8gb3MgZGlyZWl0b3MgZGUgcmVwcm9kdcOnw6NvLCBleGliacOnw6NvLCBleGVjdcOnw6NvLApkZWNsYW1hw6fDo28sIHJlY2l0YcOnw6NvLCBleHBvc2nDp8OjbywgYXJxdWl2YW1lbnRvLCBpbmNsdXPDo28gZW0gYmFuY28gZGUgZGFkb3MsIHByZXNlcnZhw6fDo28sIGRpZnVzw6NvLApkaXN0cmlidWnDp8OjbywgZGl2dWxnYcOnw6NvLCBlbXByw6lzdGltbywgdHJhZHXDp8OjbywgZHVibGFnZW0sIGxlZ2VuZGFnZW0sIGluY2x1c8OjbyBlbSBub3ZhcyBvYnJhcyBvdQpjb2xldMOibmVhcywgcmV1dGlsaXphw6fDo28sIGVkacOnw6NvLCBwcm9kdcOnw6NvIGRlIG1hdGVyaWFsIGRpZMOhdGljbyBlIGN1cnNvcyBvdSBxdWFscXVlciBmb3JtYSBkZQp1dGlsaXphw6fDo28gbsOjbyBjb21lcmNpYWw7CgooMykgUkVDT05IRUNFIHF1ZSBhIGNlc3PDo28gYXF1aSBlc3BlY2lmaWNhZGEgY29uY2VkZSDDoCBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPCkNSVVogbyBkaXJlaXRvIGRlIGF1dG9yaXphciBxdWFscXVlciBwZXNzb2Eg4oCTIGbDrXNpY2Egb3UganVyw61kaWNhLCBww7pibGljYSBvdSBwcml2YWRhLCBuYWNpb25hbCBvdQplc3RyYW5nZWlyYSDigJMgYSBhY2Vzc2FyIGUgdXRpbGl6YXIgYW1wbGFtZW50ZSBhIE9CUkEsIHNlbSBleGNsdXNpdmlkYWRlLCBwYXJhIHF1YWlzcXVlcgpmaW5hbGlkYWRlcyBuw6NvIGNvbWVyY2lhaXM7CgooNCkgREVDTEFSQSBxdWUgYSBvYnJhIMOpIGNyaWHDp8OjbyBvcmlnaW5hbCBlIHF1ZSDDqSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGFxdWkgY2VkaWRvcyBlIGF1dG9yaXphZG9zLApyZXNwb25zYWJpbGl6YW5kby1zZSBpbnRlZ3JhbG1lbnRlIHBlbG8gY29udGXDumRvIGUgb3V0cm9zIGVsZW1lbnRvcyBxdWUgZmF6ZW0gcGFydGUgZGEgT0JSQSwKaW5jbHVzaXZlIG9zIGRpcmVpdG9zIGRlIHZveiBlIGltYWdlbSB2aW5jdWxhZG9zIMOgIE9CUkEsIG9icmlnYW5kby1zZSBhIGluZGVuaXphciB0ZXJjZWlyb3MgcG9yCmRhbm9zLCBiZW0gY29tbyBpbmRlbml6YXIgZSByZXNzYXJjaXIgYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVogZGUKZXZlbnR1YWlzIGRlc3Blc2FzIHF1ZSB2aWVyZW0gYSBzdXBvcnRhciwgZW0gcmF6w6NvIGRlIHF1YWxxdWVyIG9mZW5zYSBhIGRpcmVpdG9zIGF1dG9yYWlzIG91CmRpcmVpdG9zIGRlIHZveiBvdSBpbWFnZW0sIHByaW5jaXBhbG1lbnRlIG5vIHF1ZSBkaXogcmVzcGVpdG8gYSBwbMOhZ2lvIGUgdmlvbGHDp8O1ZXMgZGUgZGlyZWl0b3M7CgooNSkgQUZJUk1BIHF1ZSBjb25oZWNlIGEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTwpPU1dBTERPIENSVVogZSBhcyBkaXJldHJpemVzIHBhcmEgbyBmdW5jaW9uYW1lbnRvIGRvIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsIEFSQ0EuCgpBIFBvbMOtdGljYSBJbnN0aXR1Y2lvbmFsIGRlIEFjZXNzbyBBYmVydG8gZGEgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaIHJlc2VydmEKZXhjbHVzaXZhbWVudGUgYW8gQVVUT1Igb3MgZGlyZWl0b3MgbW9yYWlzIGUgb3MgdXNvcyBjb21lcmNpYWlzIHNvYnJlIGFzIG9icmFzIGRlIHN1YSBhdXRvcmlhCmUvb3UgdGl0dWxhcmlkYWRlLCBzZW5kbyBvcyB0ZXJjZWlyb3MgdXN1w6FyaW9zIHJlc3BvbnPDoXZlaXMgcGVsYSBhdHJpYnVpw6fDo28gZGUgYXV0b3JpYSBlIG1hbnV0ZW7Dp8OjbwpkYSBpbnRlZ3JpZGFkZSBkYSBPQlJBIGVtIHF1YWxxdWVyIHV0aWxpemHDp8Ojby4KCkEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVoKcmVzcGVpdGEgb3MgY29udHJhdG9zIGUgYWNvcmRvcyBwcmVleGlzdGVudGVzIGRvcyBBdXRvcmVzIGNvbSB0ZXJjZWlyb3MsIGNhYmVuZG8gYW9zIEF1dG9yZXMKaW5mb3JtYXIgw6AgSW5zdGl0dWnDp8OjbyBhcyBjb25kacOnw7VlcyBlIG91dHJhcyByZXN0cmnDp8O1ZXMgaW1wb3N0YXMgcG9yIGVzdGVzIGluc3RydW1lbnRvcy4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352021-02-13T05:04:16Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false
dc.title.pt_BR.fl_str_mv De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
title De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
spellingShingle De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
Santana, Marcos V. S.
COVID-19
SARS-CoV-2
Aprendizado de transferência
Modelo generativo
Ulmfit
COVID-19
Ulmfit
Transfer learning
De novo drug design
Generative model
title_short De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
title_full De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
title_fullStr De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
title_full_unstemmed De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
title_sort De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning
author Santana, Marcos V. S.
author_facet Santana, Marcos V. S.
Silva Jr., Floriano P.
author_role author
author2 Silva Jr., Floriano P.
author2_role author
dc.contributor.author.fl_str_mv Santana, Marcos V. S.
Silva Jr., Floriano P.
dc.subject.other.pt_BR.fl_str_mv COVID-19
SARS-CoV-2
Aprendizado de transferência
Modelo generativo
Ulmfit
topic COVID-19
SARS-CoV-2
Aprendizado de transferência
Modelo generativo
Ulmfit
COVID-19
Ulmfit
Transfer learning
De novo drug design
Generative model
dc.subject.en.pt_BR.fl_str_mv COVID-19
Ulmfit
Transfer learning
De novo drug design
Generative model
description Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. LaBECFar - Laboratório de Bioquímica Experimental e Computacional de Fármacos. Rio de Janeiro, RJ, Brasil.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-02-12T20:14:35Z
dc.date.available.fl_str_mv 2021-02-12T20:14:35Z
dc.date.issued.fl_str_mv 2021
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv SANTANA, Marcos V. S.; SILVA JR., Floriano P. De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning. BMC Chemistry, v. 15, n. 8, p. 1-20, 2021.
dc.identifier.uri.fl_str_mv https://www.arca.fiocruz.br/handle/icict/46077
dc.identifier.issn.pt_BR.fl_str_mv 2661-801X
dc.identifier.doi.none.fl_str_mv 10.1186/s13065-021-00737-2
identifier_str_mv SANTANA, Marcos V. S.; SILVA JR., Floriano P. De novo design and bioactivity prediction of SARS‑CoV‑2 main protease inhibitors using recurrent neural network‑based transfer learning. BMC Chemistry, v. 15, n. 8, p. 1-20, 2021.
2661-801X
10.1186/s13065-021-00737-2
url https://www.arca.fiocruz.br/handle/icict/46077
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv BMC
publisher.none.fl_str_mv BMC
dc.source.none.fl_str_mv reponame:Repositório Institucional da FIOCRUZ (ARCA)
instname:Fundação Oswaldo Cruz (FIOCRUZ)
instacron:FIOCRUZ
instname_str Fundação Oswaldo Cruz (FIOCRUZ)
instacron_str FIOCRUZ
institution FIOCRUZ
reponame_str Repositório Institucional da FIOCRUZ (ARCA)
collection Repositório Institucional da FIOCRUZ (ARCA)
bitstream.url.fl_str_mv https://www.arca.fiocruz.br/bitstream/icict/46077/1/license.txt
https://www.arca.fiocruz.br/bitstream/icict/46077/2/Santana_Marcos_etal_IOC_2021_COVID-19.pdf
https://www.arca.fiocruz.br/bitstream/icict/46077/3/Santana_Marcos_etal_IOC_2021_COVID-19.pdf.txt
bitstream.checksum.fl_str_mv 5a560609d32a3863062d77ff32785d58
e746492b4e06d74f1d5dd36ef4c888ee
890e7394d273a426d96b4eb33ddfa86a
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)
repository.mail.fl_str_mv repositorio.arca@fiocruz.br
_version_ 1813009117455843328