Machine learning methods to predict the crystallization propensity of small organic molecules

Detalhes bibliográficos
Autor(a) principal: Pereira, Florbela
Data de Publicação: 2020
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/110180
Resumo: Fundacao para a Ciencia e Tecnologia (FCT) Portugal, under grant UID/QUI/50006/2019 (provided to the Associate Laboratory for Green Chemistry LAQV) is greatly appreciated. Florbela Pereira thanks Fundacao para a Ciencia e a Tecnologia, MCTES, for the Norma transitoria DL 57/2016 Program Contract.
id RCAP_8bed00ae9b6a70ef68133cf37f29d76a
oai_identifier_str oai:run.unl.pt:10362/110180
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Machine learning methods to predict the crystallization propensity of small organic moleculesChemistry(all)Materials Science(all)Condensed Matter PhysicsFundacao para a Ciencia e Tecnologia (FCT) Portugal, under grant UID/QUI/50006/2019 (provided to the Associate Laboratory for Green Chemistry LAQV) is greatly appreciated. Florbela Pereira thanks Fundacao para a Ciencia e a Tecnologia, MCTES, for the Norma transitoria DL 57/2016 Program Contract.Machine learning (ML) algorithms were explored for the prediction of the crystallization propensity based on molecular descriptors and fingerprints generated from 2D chemical structures and 3D molecular descriptors from 3D chemical structures optimized with empirical methods. In total, 57 815 molecules were retrieved from the Reaxys® database, from those 53 998 molecules are recorded as crystalline (class A), 3097 as polymorphic (class B), and 720 as amorphous (class C). A training data set with 40 462 organic molecules was used to build the models, which were validated with an external test set comprising 17 353 organic molecules. Several ML algorithms such as random forest (RF), support vector machines (SVM), and deep learning multilayer perceptron networks (MLP) were screened. The best performance was achieved with a consensus classification model obtained by RF, SVM, and MLP models, which predicted the external test set with an overall predictive accuracy (Q) of up to 80%.LAQV@REQUIMTEDQ - Departamento de QuímicaRUNPereira, Florbela2022-03-31T00:31:40Z2020-04-282020-04-28T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article10application/pdfhttp://hdl.handle.net/10362/110180eng1466-8033PURE: 18074508https://doi.org/10.1039/d0ce00070ainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:54:11Zoai:run.unl.pt:10362/110180Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:41:35.115955Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Machine learning methods to predict the crystallization propensity of small organic molecules
title Machine learning methods to predict the crystallization propensity of small organic molecules
spellingShingle Machine learning methods to predict the crystallization propensity of small organic molecules
Pereira, Florbela
Chemistry(all)
Materials Science(all)
Condensed Matter Physics
title_short Machine learning methods to predict the crystallization propensity of small organic molecules
title_full Machine learning methods to predict the crystallization propensity of small organic molecules
title_fullStr Machine learning methods to predict the crystallization propensity of small organic molecules
title_full_unstemmed Machine learning methods to predict the crystallization propensity of small organic molecules
title_sort Machine learning methods to predict the crystallization propensity of small organic molecules
author Pereira, Florbela
author_facet Pereira, Florbela
author_role author
dc.contributor.none.fl_str_mv LAQV@REQUIMTE
DQ - Departamento de Química
RUN
dc.contributor.author.fl_str_mv Pereira, Florbela
dc.subject.por.fl_str_mv Chemistry(all)
Materials Science(all)
Condensed Matter Physics
topic Chemistry(all)
Materials Science(all)
Condensed Matter Physics
description Fundacao para a Ciencia e Tecnologia (FCT) Portugal, under grant UID/QUI/50006/2019 (provided to the Associate Laboratory for Green Chemistry LAQV) is greatly appreciated. Florbela Pereira thanks Fundacao para a Ciencia e a Tecnologia, MCTES, for the Norma transitoria DL 57/2016 Program Contract.
publishDate 2020
dc.date.none.fl_str_mv 2020-04-28
2020-04-28T00:00:00Z
2022-03-31T00:31:40Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/110180
url http://hdl.handle.net/10362/110180
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1466-8033
PURE: 18074508
https://doi.org/10.1039/d0ce00070a
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 10
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138028948553728