Visualizing the document pre-processing effects in text mining process
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1007/978-3-319-54978-1_62 http://hdl.handle.net/11449/176206 |
Resumo: | Text mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining. |
id |
UNSP_2c4206cebe2b951e0b74620ee2242133 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/176206 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Visualizing the document pre-processing effects in text mining processDocument pre-processingDocument similarityMultidimensional projectionText miningVisualizationText mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Faculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual PaulistaFaculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual PaulistaFAPESP: # 2013/03452-0Universidade Estadual Paulista (Unesp)Eler, Danilo Medeiros [UNESP]Pola, Ives Renê Venturini [UNESP]Garcia, Rogério Eduardo [UNESP]Teixeira, Jaqueline Batista Martins [UNESP]2018-12-11T17:19:35Z2018-12-11T17:19:35Z2018-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject485-491http://dx.doi.org/10.1007/978-3-319-54978-1_62Advances in Intelligent Systems and Computing, v. 558, p. 485-491.2194-5357http://hdl.handle.net/11449/17620610.1007/978-3-319-54978-1_622-s2.0-8504573184080310125732593610000-0003-1248-528XScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengAdvances in Intelligent Systems and Computinginfo:eu-repo/semantics/openAccess2024-06-19T14:32:18Zoai:repositorio.unesp.br:11449/176206Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T15:40:00.376577Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Visualizing the document pre-processing effects in text mining process |
title |
Visualizing the document pre-processing effects in text mining process |
spellingShingle |
Visualizing the document pre-processing effects in text mining process Eler, Danilo Medeiros [UNESP] Document pre-processing Document similarity Multidimensional projection Text mining Visualization |
title_short |
Visualizing the document pre-processing effects in text mining process |
title_full |
Visualizing the document pre-processing effects in text mining process |
title_fullStr |
Visualizing the document pre-processing effects in text mining process |
title_full_unstemmed |
Visualizing the document pre-processing effects in text mining process |
title_sort |
Visualizing the document pre-processing effects in text mining process |
author |
Eler, Danilo Medeiros [UNESP] |
author_facet |
Eler, Danilo Medeiros [UNESP] Pola, Ives Renê Venturini [UNESP] Garcia, Rogério Eduardo [UNESP] Teixeira, Jaqueline Batista Martins [UNESP] |
author_role |
author |
author2 |
Pola, Ives Renê Venturini [UNESP] Garcia, Rogério Eduardo [UNESP] Teixeira, Jaqueline Batista Martins [UNESP] |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (Unesp) |
dc.contributor.author.fl_str_mv |
Eler, Danilo Medeiros [UNESP] Pola, Ives Renê Venturini [UNESP] Garcia, Rogério Eduardo [UNESP] Teixeira, Jaqueline Batista Martins [UNESP] |
dc.subject.por.fl_str_mv |
Document pre-processing Document similarity Multidimensional projection Text mining Visualization |
topic |
Document pre-processing Document similarity Multidimensional projection Text mining Visualization |
description |
Text mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-11T17:19:35Z 2018-12-11T17:19:35Z 2018-01-01 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1007/978-3-319-54978-1_62 Advances in Intelligent Systems and Computing, v. 558, p. 485-491. 2194-5357 http://hdl.handle.net/11449/176206 10.1007/978-3-319-54978-1_62 2-s2.0-85045731840 8031012573259361 0000-0003-1248-528X |
url |
http://dx.doi.org/10.1007/978-3-319-54978-1_62 http://hdl.handle.net/11449/176206 |
identifier_str_mv |
Advances in Intelligent Systems and Computing, v. 558, p. 485-491. 2194-5357 10.1007/978-3-319-54978-1_62 2-s2.0-85045731840 8031012573259361 0000-0003-1248-528X |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Advances in Intelligent Systems and Computing |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
485-491 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808128547753033728 |