Visualizing the document pre-processing effects in text mining process

Detalhes bibliográficos
Autor(a) principal: Eler, Danilo Medeiros [UNESP]
Data de Publicação: 2018
Outros Autores: Pola, Ives Renê Venturini [UNESP], Garcia, Rogério Eduardo [UNESP], Teixeira, Jaqueline Batista Martins [UNESP]
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1007/978-3-319-54978-1_62
http://hdl.handle.net/11449/176206
Resumo: Text mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.
id UNSP_2c4206cebe2b951e0b74620ee2242133
oai_identifier_str oai:repositorio.unesp.br:11449/176206
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Visualizing the document pre-processing effects in text mining processDocument pre-processingDocument similarityMultidimensional projectionText miningVisualizationText mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Faculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual PaulistaFaculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual PaulistaFAPESP: # 2013/03452-0Universidade Estadual Paulista (Unesp)Eler, Danilo Medeiros [UNESP]Pola, Ives Renê Venturini [UNESP]Garcia, Rogério Eduardo [UNESP]Teixeira, Jaqueline Batista Martins [UNESP]2018-12-11T17:19:35Z2018-12-11T17:19:35Z2018-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject485-491http://dx.doi.org/10.1007/978-3-319-54978-1_62Advances in Intelligent Systems and Computing, v. 558, p. 485-491.2194-5357http://hdl.handle.net/11449/17620610.1007/978-3-319-54978-1_622-s2.0-8504573184080310125732593610000-0003-1248-528XScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAdvances in Intelligent Systems and Computinginfo:eu-repo/semantics/openAccess2024-06-19T14:32:18Zoai:repositorio.unesp.br:11449/176206Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T15:40:00.376577Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Visualizing the document pre-processing effects in text mining process
title Visualizing the document pre-processing effects in text mining process
spellingShingle Visualizing the document pre-processing effects in text mining process
Eler, Danilo Medeiros [UNESP]
Document pre-processing
Document similarity
Multidimensional projection
Text mining
Visualization
title_short Visualizing the document pre-processing effects in text mining process
title_full Visualizing the document pre-processing effects in text mining process
title_fullStr Visualizing the document pre-processing effects in text mining process
title_full_unstemmed Visualizing the document pre-processing effects in text mining process
title_sort Visualizing the document pre-processing effects in text mining process
author Eler, Danilo Medeiros [UNESP]
author_facet Eler, Danilo Medeiros [UNESP]
Pola, Ives Renê Venturini [UNESP]
Garcia, Rogério Eduardo [UNESP]
Teixeira, Jaqueline Batista Martins [UNESP]
author_role author
author2 Pola, Ives Renê Venturini [UNESP]
Garcia, Rogério Eduardo [UNESP]
Teixeira, Jaqueline Batista Martins [UNESP]
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Eler, Danilo Medeiros [UNESP]
Pola, Ives Renê Venturini [UNESP]
Garcia, Rogério Eduardo [UNESP]
Teixeira, Jaqueline Batista Martins [UNESP]
dc.subject.por.fl_str_mv Document pre-processing
Document similarity
Multidimensional projection
Text mining
Visualization
topic Document pre-processing
Document similarity
Multidimensional projection
Text mining
Visualization
description Text mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-11T17:19:35Z
2018-12-11T17:19:35Z
2018-01-01
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1007/978-3-319-54978-1_62
Advances in Intelligent Systems and Computing, v. 558, p. 485-491.
2194-5357
http://hdl.handle.net/11449/176206
10.1007/978-3-319-54978-1_62
2-s2.0-85045731840
8031012573259361
0000-0003-1248-528X
url http://dx.doi.org/10.1007/978-3-319-54978-1_62
http://hdl.handle.net/11449/176206
identifier_str_mv Advances in Intelligent Systems and Computing, v. 558, p. 485-491.
2194-5357
10.1007/978-3-319-54978-1_62
2-s2.0-85045731840
8031012573259361
0000-0003-1248-528X
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Advances in Intelligent Systems and Computing
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 485-491
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128547753033728