Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches

Detalhes bibliográficos
Autor(a) principal: Akadi, Ali El
Data de Publicação: 2009
Outros Autores: Amine, Aouatif, El Ouardighi, Abdeljalil, Aboutajdine, Driss
Tipo de documento: Artigo
Idioma: eng
Título da fonte: INFOCOMP: Jornal de Ciência da Computação
Texto Completo: https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279
Resumo: Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy Maximum Relevance) and GA (Genetic Algorithm): In the first stage, MRMR is used to filter noisy and redundant genes in high dimensional microarray data. In the second stage, the GA uses the classifier accuracy as a fitness function to select the highly discriminating genes. The proposed method is tested on five open datasets: NCI, Lymphoma, Lung, Leukemia and Colon using Support Vector Machine and Naïve Bayes classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows that our method is able to find the smallest gene subset that gives the most classification accuracy in leave-one-out cross-validation (LOOCV).
id UFLA-5_043fa709260ceb2d88ff7a12679a5e9d
oai_identifier_str oai:infocomp.dcc.ufla.br:article/279
network_acronym_str UFLA-5
network_name_str INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling Feature Selection For Genomic Data By Combining Filter And Wrapper ApproachesFeature selectionGenetic algorithmMRMRSupport Vector MachineNaïve Bayes classi- fierLOOCVGene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy Maximum Relevance) and GA (Genetic Algorithm): In the first stage, MRMR is used to filter noisy and redundant genes in high dimensional microarray data. In the second stage, the GA uses the classifier accuracy as a fitness function to select the highly discriminating genes. The proposed method is tested on five open datasets: NCI, Lymphoma, Lung, Leukemia and Colon using Support Vector Machine and Naïve Bayes classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows that our method is able to find the smallest gene subset that gives the most classification accuracy in leave-one-out cross-validation (LOOCV).Editora da UFLA2009-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279INFOCOMP Journal of Computer Science; Vol. 8 No. 4 (2009): December, 2009; 28-361982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279/264Copyright (c) 2016 INFOCOMP Journal of Computer Scienceinfo:eu-repo/semantics/openAccessAkadi, Ali ElAmine, AouatifEl Ouardighi, AbdeljalilAboutajdine, Driss2015-07-22T18:26:29Zoai:infocomp.dcc.ufla.br:article/279Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:29.365809INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
title Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
spellingShingle Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
Akadi, Ali El
Feature selection
Genetic algorithm
MRMR
Support Vector Machine
Naïve Bayes classi- fier
LOOCV
title_short Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
title_full Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
title_fullStr Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
title_full_unstemmed Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
title_sort Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches
author Akadi, Ali El
author_facet Akadi, Ali El
Amine, Aouatif
El Ouardighi, Abdeljalil
Aboutajdine, Driss
author_role author
author2 Amine, Aouatif
El Ouardighi, Abdeljalil
Aboutajdine, Driss
author2_role author
author
author
dc.contributor.author.fl_str_mv Akadi, Ali El
Amine, Aouatif
El Ouardighi, Abdeljalil
Aboutajdine, Driss
dc.subject.por.fl_str_mv Feature selection
Genetic algorithm
MRMR
Support Vector Machine
Naïve Bayes classi- fier
LOOCV
topic Feature selection
Genetic algorithm
MRMR
Support Vector Machine
Naïve Bayes classi- fier
LOOCV
description Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy Maximum Relevance) and GA (Genetic Algorithm): In the first stage, MRMR is used to filter noisy and redundant genes in high dimensional microarray data. In the second stage, the GA uses the classifier accuracy as a fitness function to select the highly discriminating genes. The proposed method is tested on five open datasets: NCI, Lymphoma, Lung, Leukemia and Colon using Support Vector Machine and Naïve Bayes classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows that our method is able to find the smallest gene subset that gives the most classification accuracy in leave-one-out cross-validation (LOOCV).
publishDate 2009
dc.date.none.fl_str_mv 2009-12-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279
url https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279/264
dc.rights.driver.fl_str_mv Copyright (c) 2016 INFOCOMP Journal of Computer Science
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2016 INFOCOMP Journal of Computer Science
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Editora da UFLA
publisher.none.fl_str_mv Editora da UFLA
dc.source.none.fl_str_mv INFOCOMP Journal of Computer Science; Vol. 8 No. 4 (2009): December, 2009; 28-36
1982-3363
1807-4545
reponame:INFOCOMP: Jornal de Ciência da Computação
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str INFOCOMP: Jornal de Ciência da Computação
collection INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv infocomp@dcc.ufla.br||apfreire@dcc.ufla.br
_version_ 1799874740909768704