Evaluation of gene selection metrics for tumor cell classification

Detalhes bibliográficos
Autor(a) principal: Faceli,Katti
Data de Publicação: 2004
Outros Autores: Carvalho,André C.P.L.F. de, Silva Jr,Wilson A.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Genetics and Molecular Biology
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029
Resumo: Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.
id SBG-1_0e4a9a879ad0d9d391c37f6ecf18eed6
oai_identifier_str oai:scielo:S1415-47572004000400029
network_acronym_str SBG-1
network_name_str Genetics and Molecular Biology
repository_id_str
spelling Evaluation of gene selection metrics for tumor cell classificationgene selectionmachine learninggene expressionsageGene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.Sociedade Brasileira de Genética2004-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029Genetics and Molecular Biology v.27 n.4 2004reponame:Genetics and Molecular Biologyinstname:Sociedade Brasileira de Genética (SBG)instacron:SBG10.1590/S1415-47572004000400029info:eu-repo/semantics/openAccessFaceli,KattiCarvalho,André C.P.L.F. deSilva Jr,Wilson A.eng2005-01-14T00:00:00Zoai:scielo:S1415-47572004000400029Revistahttp://www.gmb.org.br/ONGhttps://old.scielo.br/oai/scielo-oai.php||editor@gmb.org.br1678-46851415-4757opendoar:2005-01-14T00:00Genetics and Molecular Biology - Sociedade Brasileira de Genética (SBG)false
dc.title.none.fl_str_mv Evaluation of gene selection metrics for tumor cell classification
title Evaluation of gene selection metrics for tumor cell classification
spellingShingle Evaluation of gene selection metrics for tumor cell classification
Faceli,Katti
gene selection
machine learning
gene expression
sage
title_short Evaluation of gene selection metrics for tumor cell classification
title_full Evaluation of gene selection metrics for tumor cell classification
title_fullStr Evaluation of gene selection metrics for tumor cell classification
title_full_unstemmed Evaluation of gene selection metrics for tumor cell classification
title_sort Evaluation of gene selection metrics for tumor cell classification
author Faceli,Katti
author_facet Faceli,Katti
Carvalho,André C.P.L.F. de
Silva Jr,Wilson A.
author_role author
author2 Carvalho,André C.P.L.F. de
Silva Jr,Wilson A.
author2_role author
author
dc.contributor.author.fl_str_mv Faceli,Katti
Carvalho,André C.P.L.F. de
Silva Jr,Wilson A.
dc.subject.por.fl_str_mv gene selection
machine learning
gene expression
sage
topic gene selection
machine learning
gene expression
sage
description Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.
publishDate 2004
dc.date.none.fl_str_mv 2004-01-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/S1415-47572004000400029
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Sociedade Brasileira de Genética
publisher.none.fl_str_mv Sociedade Brasileira de Genética
dc.source.none.fl_str_mv Genetics and Molecular Biology v.27 n.4 2004
reponame:Genetics and Molecular Biology
instname:Sociedade Brasileira de Genética (SBG)
instacron:SBG
instname_str Sociedade Brasileira de Genética (SBG)
instacron_str SBG
institution SBG
reponame_str Genetics and Molecular Biology
collection Genetics and Molecular Biology
repository.name.fl_str_mv Genetics and Molecular Biology - Sociedade Brasileira de Genética (SBG)
repository.mail.fl_str_mv ||editor@gmb.org.br
_version_ 1752122379411128320