Evaluation of gene selection metrics for tumor cell classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2004 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Genetics and Molecular Biology |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029 |
Resumo: | Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic. |
id |
SBG-1_0e4a9a879ad0d9d391c37f6ecf18eed6 |
---|---|
oai_identifier_str |
oai:scielo:S1415-47572004000400029 |
network_acronym_str |
SBG-1 |
network_name_str |
Genetics and Molecular Biology |
repository_id_str |
|
spelling |
Evaluation of gene selection metrics for tumor cell classificationgene selectionmachine learninggene expressionsageGene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.Sociedade Brasileira de Genética2004-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029Genetics and Molecular Biology v.27 n.4 2004reponame:Genetics and Molecular Biologyinstname:Sociedade Brasileira de Genética (SBG)instacron:SBG10.1590/S1415-47572004000400029info:eu-repo/semantics/openAccessFaceli,KattiCarvalho,André C.P.L.F. deSilva Jr,Wilson A.eng2005-01-14T00:00:00Zoai:scielo:S1415-47572004000400029Revistahttp://www.gmb.org.br/ONGhttps://old.scielo.br/oai/scielo-oai.php||editor@gmb.org.br1678-46851415-4757opendoar:2005-01-14T00:00Genetics and Molecular Biology - Sociedade Brasileira de Genética (SBG)false |
dc.title.none.fl_str_mv |
Evaluation of gene selection metrics for tumor cell classification |
title |
Evaluation of gene selection metrics for tumor cell classification |
spellingShingle |
Evaluation of gene selection metrics for tumor cell classification Faceli,Katti gene selection machine learning gene expression sage |
title_short |
Evaluation of gene selection metrics for tumor cell classification |
title_full |
Evaluation of gene selection metrics for tumor cell classification |
title_fullStr |
Evaluation of gene selection metrics for tumor cell classification |
title_full_unstemmed |
Evaluation of gene selection metrics for tumor cell classification |
title_sort |
Evaluation of gene selection metrics for tumor cell classification |
author |
Faceli,Katti |
author_facet |
Faceli,Katti Carvalho,André C.P.L.F. de Silva Jr,Wilson A. |
author_role |
author |
author2 |
Carvalho,André C.P.L.F. de Silva Jr,Wilson A. |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Faceli,Katti Carvalho,André C.P.L.F. de Silva Jr,Wilson A. |
dc.subject.por.fl_str_mv |
gene selection machine learning gene expression sage |
topic |
gene selection machine learning gene expression sage |
description |
Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic. |
publishDate |
2004 |
dc.date.none.fl_str_mv |
2004-01-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1415-47572004000400029 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/S1415-47572004000400029 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Sociedade Brasileira de Genética |
publisher.none.fl_str_mv |
Sociedade Brasileira de Genética |
dc.source.none.fl_str_mv |
Genetics and Molecular Biology v.27 n.4 2004 reponame:Genetics and Molecular Biology instname:Sociedade Brasileira de Genética (SBG) instacron:SBG |
instname_str |
Sociedade Brasileira de Genética (SBG) |
instacron_str |
SBG |
institution |
SBG |
reponame_str |
Genetics and Molecular Biology |
collection |
Genetics and Molecular Biology |
repository.name.fl_str_mv |
Genetics and Molecular Biology - Sociedade Brasileira de Genética (SBG) |
repository.mail.fl_str_mv |
||editor@gmb.org.br |
_version_ |
1752122379411128320 |