Using otsu's threshold selection method for eliminating terms in vector space model computation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1109/IV.2013.29 http://hdl.handle.net/11449/227530 |
Resumo: | Visualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE. |
id |
UNSP_78bcb7cb08f77693b71d9bbd85ef9fda |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/227530 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Using otsu's threshold selection method for eliminating terms in vector space model computationOtsu's Threshold Selection MethodTerm Frequency ThresholdingVector Space Model ComputationVisual Text MiningVisualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE.Faculdade de Ciěncias e Tecnologia UNESP - Univ Estadual Paulista Departamento de Mateḿatica e Computação, Presidente Prudente/SPFaculdade de Ciěncias e Tecnologia UNESP - Univ Estadual Paulista Departamento de Mateḿatica e Computação, Presidente Prudente/SPUniversidade Estadual Paulista (UNESP)Eler, Danilo Medeiros [UNESP]Garcia, Rogerio Eduardo [UNESP]2022-04-29T07:13:47Z2022-04-29T07:13:47Z2013-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject220-226http://dx.doi.org/10.1109/IV.2013.29Proceedings of the International Conference on Information Visualisation, p. 220-226.1093-9547http://hdl.handle.net/11449/22753010.1109/IV.2013.292-s2.0-84893276074Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings of the International Conference on Information Visualisationinfo:eu-repo/semantics/openAccess2024-06-19T14:32:18Zoai:repositorio.unesp.br:11449/227530Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T16:53:57.616610Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
title |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
spellingShingle |
Using otsu's threshold selection method for eliminating terms in vector space model computation Eler, Danilo Medeiros [UNESP] Otsu's Threshold Selection Method Term Frequency Thresholding Vector Space Model Computation Visual Text Mining |
title_short |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
title_full |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
title_fullStr |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
title_full_unstemmed |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
title_sort |
Using otsu's threshold selection method for eliminating terms in vector space model computation |
author |
Eler, Danilo Medeiros [UNESP] |
author_facet |
Eler, Danilo Medeiros [UNESP] Garcia, Rogerio Eduardo [UNESP] |
author_role |
author |
author2 |
Garcia, Rogerio Eduardo [UNESP] |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) |
dc.contributor.author.fl_str_mv |
Eler, Danilo Medeiros [UNESP] Garcia, Rogerio Eduardo [UNESP] |
dc.subject.por.fl_str_mv |
Otsu's Threshold Selection Method Term Frequency Thresholding Vector Space Model Computation Visual Text Mining |
topic |
Otsu's Threshold Selection Method Term Frequency Thresholding Vector Space Model Computation Visual Text Mining |
description |
Visualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-12-01 2022-04-29T07:13:47Z 2022-04-29T07:13:47Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1109/IV.2013.29 Proceedings of the International Conference on Information Visualisation, p. 220-226. 1093-9547 http://hdl.handle.net/11449/227530 10.1109/IV.2013.29 2-s2.0-84893276074 |
url |
http://dx.doi.org/10.1109/IV.2013.29 http://hdl.handle.net/11449/227530 |
identifier_str_mv |
Proceedings of the International Conference on Information Visualisation, p. 220-226. 1093-9547 10.1109/IV.2013.29 2-s2.0-84893276074 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Proceedings of the International Conference on Information Visualisation |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
220-226 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808128718639464448 |