Improving the drug discovery process by using multiple classifier systems

Detalhes bibliográficos
Autor(a) principal: Ruano-Ordás, D.
Data de Publicação: 2019
Outros Autores: Yevseyeva, I., Basto-Fernandes, V., Méndez, J. R., Emmerichd, M. T. M.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/17465
Resumo: Machine learning methods have become an indispensable tool for utilizing large knowledge and data repositories in science and technology. In the context of the pharmaceutical domain, the amount of acquired knowledge about the design and synthesis of pharmaceutical agents and bioactive molecules (drugs) is enormous. The primary challenge for automatically discovering new drugs from molecular screening information is related to the high dimensionality of datasets, where a wide range of features is included for each candidate drug. Thus, the implementation of improved techniques to ensure an adequate manipulation and interpretation of data becomes mandatory. To mitigate this problem, our tool (called D2-MCS) can split homogeneously the dataset into several groups (the subset of features) and subsequently, determine the most suitable classifier for each group. Finally, the tool allows determining the biological activity of each molecule by a voting scheme. The application of the D2-MCS tool was tested on a standardized, high quality dataset gathered from ChEMBL and have shown outperformance of our tool when compare to well-known single classification models.
id RCAP_4e6a7621363ee7463292a4eeef331db2
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/17465
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Improving the drug discovery process by using multiple classifier systemsDrug discoveryMachine learning algorithmsFeature clusteringMultiple classifier systemsMachine learning methods have become an indispensable tool for utilizing large knowledge and data repositories in science and technology. In the context of the pharmaceutical domain, the amount of acquired knowledge about the design and synthesis of pharmaceutical agents and bioactive molecules (drugs) is enormous. The primary challenge for automatically discovering new drugs from molecular screening information is related to the high dimensionality of datasets, where a wide range of features is included for each candidate drug. Thus, the implementation of improved techniques to ensure an adequate manipulation and interpretation of data becomes mandatory. To mitigate this problem, our tool (called D2-MCS) can split homogeneously the dataset into several groups (the subset of features) and subsequently, determine the most suitable classifier for each group. Finally, the tool allows determining the biological activity of each molecule by a voting scheme. The application of the D2-MCS tool was tested on a standardized, high quality dataset gathered from ChEMBL and have shown outperformance of our tool when compare to well-known single classification models.Pergamon/Elsevier2019-02-28T16:35:38Z2020-02-28T00:00:00Z2019-01-01T00:00:00Z20192019-02-28T16:34:38+0000info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/17465eng0957-417410.1016/j.eswa.2018.12.032Ruano-Ordás, D.Yevseyeva, I.Basto-Fernandes, V.Méndez, J. R.Emmerichd, M. T. M.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:42:53Zoai:repositorio.iscte-iul.pt:10071/17465Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:20:07.459383Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Improving the drug discovery process by using multiple classifier systems
title Improving the drug discovery process by using multiple classifier systems
spellingShingle Improving the drug discovery process by using multiple classifier systems
Ruano-Ordás, D.
Drug discovery
Machine learning algorithms
Feature clustering
Multiple classifier systems
title_short Improving the drug discovery process by using multiple classifier systems
title_full Improving the drug discovery process by using multiple classifier systems
title_fullStr Improving the drug discovery process by using multiple classifier systems
title_full_unstemmed Improving the drug discovery process by using multiple classifier systems
title_sort Improving the drug discovery process by using multiple classifier systems
author Ruano-Ordás, D.
author_facet Ruano-Ordás, D.
Yevseyeva, I.
Basto-Fernandes, V.
Méndez, J. R.
Emmerichd, M. T. M.
author_role author
author2 Yevseyeva, I.
Basto-Fernandes, V.
Méndez, J. R.
Emmerichd, M. T. M.
author2_role author
author
author
author
dc.contributor.author.fl_str_mv Ruano-Ordás, D.
Yevseyeva, I.
Basto-Fernandes, V.
Méndez, J. R.
Emmerichd, M. T. M.
dc.subject.por.fl_str_mv Drug discovery
Machine learning algorithms
Feature clustering
Multiple classifier systems
topic Drug discovery
Machine learning algorithms
Feature clustering
Multiple classifier systems
description Machine learning methods have become an indispensable tool for utilizing large knowledge and data repositories in science and technology. In the context of the pharmaceutical domain, the amount of acquired knowledge about the design and synthesis of pharmaceutical agents and bioactive molecules (drugs) is enormous. The primary challenge for automatically discovering new drugs from molecular screening information is related to the high dimensionality of datasets, where a wide range of features is included for each candidate drug. Thus, the implementation of improved techniques to ensure an adequate manipulation and interpretation of data becomes mandatory. To mitigate this problem, our tool (called D2-MCS) can split homogeneously the dataset into several groups (the subset of features) and subsequently, determine the most suitable classifier for each group. Finally, the tool allows determining the biological activity of each molecule by a voting scheme. The application of the D2-MCS tool was tested on a standardized, high quality dataset gathered from ChEMBL and have shown outperformance of our tool when compare to well-known single classification models.
publishDate 2019
dc.date.none.fl_str_mv 2019-02-28T16:35:38Z
2019-01-01T00:00:00Z
2019
2019-02-28T16:34:38+0000
2020-02-28T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/17465
url http://hdl.handle.net/10071/17465
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0957-4174
10.1016/j.eswa.2018.12.032
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Pergamon/Elsevier
publisher.none.fl_str_mv Pergamon/Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134761283747840