Methodology to identify a gene expression signature by merging microarray datasets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.14/40884 |
Resumo: | A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES. |
id |
RCAP_a54cf679c9ea4bd1ffb514a639b83318 |
---|---|
oai_identifier_str |
oai:repositorio.ucp.pt:10400.14/40884 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Methodology to identify a gene expression signature by merging microarray datasetsAutism spectrum disorderGene expression signatureHeart failureLSVMMicroarray dataNeural networkRandom forestA vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.Veritati - Repositório Institucional da Universidade Católica PortuguesaFajarda, OlgaAlmeida, João RafaelDuarte-Pereira, SaraSilva, Raquel M.Oliveira, José Luís2023-04-19T13:07:02Z2023-062023-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.14/40884eng0010-482510.1016/j.compbiomed.2023.1068678515212934837060770000982862600001info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-12T17:46:27Zoai:repositorio.ucp.pt:10400.14/40884Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:33:34.192655Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Methodology to identify a gene expression signature by merging microarray datasets |
title |
Methodology to identify a gene expression signature by merging microarray datasets |
spellingShingle |
Methodology to identify a gene expression signature by merging microarray datasets Fajarda, Olga Autism spectrum disorder Gene expression signature Heart failure LSVM Microarray data Neural network Random forest |
title_short |
Methodology to identify a gene expression signature by merging microarray datasets |
title_full |
Methodology to identify a gene expression signature by merging microarray datasets |
title_fullStr |
Methodology to identify a gene expression signature by merging microarray datasets |
title_full_unstemmed |
Methodology to identify a gene expression signature by merging microarray datasets |
title_sort |
Methodology to identify a gene expression signature by merging microarray datasets |
author |
Fajarda, Olga |
author_facet |
Fajarda, Olga Almeida, João Rafael Duarte-Pereira, Sara Silva, Raquel M. Oliveira, José Luís |
author_role |
author |
author2 |
Almeida, João Rafael Duarte-Pereira, Sara Silva, Raquel M. Oliveira, José Luís |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Veritati - Repositório Institucional da Universidade Católica Portuguesa |
dc.contributor.author.fl_str_mv |
Fajarda, Olga Almeida, João Rafael Duarte-Pereira, Sara Silva, Raquel M. Oliveira, José Luís |
dc.subject.por.fl_str_mv |
Autism spectrum disorder Gene expression signature Heart failure LSVM Microarray data Neural network Random forest |
topic |
Autism spectrum disorder Gene expression signature Heart failure LSVM Microarray data Neural network Random forest |
description |
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-04-19T13:07:02Z 2023-06 2023-06-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.14/40884 |
url |
http://hdl.handle.net/10400.14/40884 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0010-4825 10.1016/j.compbiomed.2023.106867 85152129348 37060770 000982862600001 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132062318329856 |