Using otsu's threshold selection method for eliminating terms in vector space model computation

Detalhes bibliográficos
Autor(a) principal: Eler, Danilo Medeiros [UNESP]
Data de Publicação: 2013
Outros Autores: Garcia, Rogerio Eduardo [UNESP]
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1109/IV.2013.29
http://hdl.handle.net/11449/227530
Resumo: Visualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE.
id UNSP_78bcb7cb08f77693b71d9bbd85ef9fda
oai_identifier_str oai:repositorio.unesp.br:11449/227530
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Using otsu's threshold selection method for eliminating terms in vector space model computationOtsu's Threshold Selection MethodTerm Frequency ThresholdingVector Space Model ComputationVisual Text MiningVisualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE.Faculdade de Ciěncias e Tecnologia UNESP - Univ Estadual Paulista Departamento de Mateḿatica e Computação, Presidente Prudente/SPFaculdade de Ciěncias e Tecnologia UNESP - Univ Estadual Paulista Departamento de Mateḿatica e Computação, Presidente Prudente/SPUniversidade Estadual Paulista (UNESP)Eler, Danilo Medeiros [UNESP]Garcia, Rogerio Eduardo [UNESP]2022-04-29T07:13:47Z2022-04-29T07:13:47Z2013-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject220-226http://dx.doi.org/10.1109/IV.2013.29Proceedings of the International Conference on Information Visualisation, p. 220-226.1093-9547http://hdl.handle.net/11449/22753010.1109/IV.2013.292-s2.0-84893276074Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings of the International Conference on Information Visualisationinfo:eu-repo/semantics/openAccess2024-06-19T14:32:18Zoai:repositorio.unesp.br:11449/227530Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T16:53:57.616610Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Using otsu's threshold selection method for eliminating terms in vector space model computation
title Using otsu's threshold selection method for eliminating terms in vector space model computation
spellingShingle Using otsu's threshold selection method for eliminating terms in vector space model computation
Eler, Danilo Medeiros [UNESP]
Otsu's Threshold Selection Method
Term Frequency Thresholding
Vector Space Model Computation
Visual Text Mining
title_short Using otsu's threshold selection method for eliminating terms in vector space model computation
title_full Using otsu's threshold selection method for eliminating terms in vector space model computation
title_fullStr Using otsu's threshold selection method for eliminating terms in vector space model computation
title_full_unstemmed Using otsu's threshold selection method for eliminating terms in vector space model computation
title_sort Using otsu's threshold selection method for eliminating terms in vector space model computation
author Eler, Danilo Medeiros [UNESP]
author_facet Eler, Danilo Medeiros [UNESP]
Garcia, Rogerio Eduardo [UNESP]
author_role author
author2 Garcia, Rogerio Eduardo [UNESP]
author2_role author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
dc.contributor.author.fl_str_mv Eler, Danilo Medeiros [UNESP]
Garcia, Rogerio Eduardo [UNESP]
dc.subject.por.fl_str_mv Otsu's Threshold Selection Method
Term Frequency Thresholding
Vector Space Model Computation
Visual Text Mining
topic Otsu's Threshold Selection Method
Term Frequency Thresholding
Vector Space Model Computation
Visual Text Mining
description Visualization techniques have proved to be valuable tools to support textual data exploration. Dimensionality reduction techniques have been widely used to produce visual representation of document collections. Focusing on multidimensional projection techniques, good visual results are produced depending on how representative terms to discriminate the documents are chosen to compose the vector space model (VSM). To define a good VSM it is necessary to apply filters during the preprocessing in order to eliminate terms using their frequency. For that, the user must evaluate the term frequency histogram based on his/her expertise in the text subject and decide the threshold value for frequency cut. Usually it is a trial and error approach that requires the user to verify the quality of visual representation after each trial. In this paper, we propose an automatic approach that applies the Otsu's Threshold Selection Method for computing a threshold using a term frequency histogram. We conducted experiments that have shown our approach generates visual representations as good as those generated with a threshold obtained by trial and error approach. The contribution of our approach is that users with non expertise are able to generate good visual representations and the time to get a good threshold is decreased. © 2013 IEEE.
publishDate 2013
dc.date.none.fl_str_mv 2013-12-01
2022-04-29T07:13:47Z
2022-04-29T07:13:47Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1109/IV.2013.29
Proceedings of the International Conference on Information Visualisation, p. 220-226.
1093-9547
http://hdl.handle.net/11449/227530
10.1109/IV.2013.29
2-s2.0-84893276074
url http://dx.doi.org/10.1109/IV.2013.29
http://hdl.handle.net/11449/227530
identifier_str_mv Proceedings of the International Conference on Information Visualisation, p. 220-226.
1093-9547
10.1109/IV.2013.29
2-s2.0-84893276074
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Proceedings of the International Conference on Information Visualisation
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 220-226
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128718639464448