Analysis of document pre-processing effects in text and opinion mining

Detalhes bibliográficos
Autor(a) principal: Eler, Danilo Medeiros [UNESP]
Data de Publicação: 2018
Outros Autores: Grosa, Denilson [UNESP], Pola, Ives, Garcia, Rogério [UNESP], Correia, Ronaldo [UNESP], Teixeira, Jaqueline [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.3390/info9040100
http://hdl.handle.net/11449/179792
Resumo: Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).
id UNSP_1c3103f22767631b3e0a702fa4e09270
oai_identifier_str oai:repositorio.unesp.br:11449/179792
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Analysis of document pre-processing effects in text and opinion miningDocument pre-processingDocument similarityMultidimensional projectionOpinion miningSentiment analysisTextminingVisualizationTypically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Departamento de Matematica e Computação Sao Paulo State University-UNESPDepartamento de Informática University of Technology-UTFPRDepartamento de Matematica e Computação Sao Paulo State University-UNESPFAPESP: 2013/03452-0Universidade Estadual Paulista (Unesp)University of Technology-UTFPREler, Danilo Medeiros [UNESP]Grosa, Denilson [UNESP]Pola, IvesGarcia, Rogério [UNESP]Correia, Ronaldo [UNESP]Teixeira, Jaqueline [UNESP]2018-12-11T17:36:46Z2018-12-11T17:36:46Z2018-04-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9040100Information (Switzerland), v. 9, n. 4, 2018.2078-2489http://hdl.handle.net/11449/17979210.3390/info90401002-s2.0-850457343072-s2.0-85045734307.pdfScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2024-06-19T14:32:05Zoai:repositorio.unesp.br:11449/179792Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:01:39.161442Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Analysis of document pre-processing effects in text and opinion mining
title Analysis of document pre-processing effects in text and opinion mining
spellingShingle Analysis of document pre-processing effects in text and opinion mining
Eler, Danilo Medeiros [UNESP]
Document pre-processing
Document similarity
Multidimensional projection
Opinion mining
Sentiment analysis
Textmining
Visualization
title_short Analysis of document pre-processing effects in text and opinion mining
title_full Analysis of document pre-processing effects in text and opinion mining
title_fullStr Analysis of document pre-processing effects in text and opinion mining
title_full_unstemmed Analysis of document pre-processing effects in text and opinion mining
title_sort Analysis of document pre-processing effects in text and opinion mining
author Eler, Danilo Medeiros [UNESP]
author_facet Eler, Danilo Medeiros [UNESP]
Grosa, Denilson [UNESP]
Pola, Ives
Garcia, Rogério [UNESP]
Correia, Ronaldo [UNESP]
Teixeira, Jaqueline [UNESP]
author_role author
author2 Grosa, Denilson [UNESP]
Pola, Ives
Garcia, Rogério [UNESP]
Correia, Ronaldo [UNESP]
Teixeira, Jaqueline [UNESP]
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (Unesp)
University of Technology-UTFPR
dc.contributor.author.fl_str_mv Eler, Danilo Medeiros [UNESP]
Grosa, Denilson [UNESP]
Pola, Ives
Garcia, Rogério [UNESP]
Correia, Ronaldo [UNESP]
Teixeira, Jaqueline [UNESP]
dc.subject.por.fl_str_mv Document pre-processing
Document similarity
Multidimensional projection
Opinion mining
Sentiment analysis
Textmining
Visualization
topic Document pre-processing
Document similarity
Multidimensional projection
Opinion mining
Sentiment analysis
Textmining
Visualization
description Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).
publishDate 2018
dc.date.none.fl_str_mv 2018-12-11T17:36:46Z
2018-12-11T17:36:46Z
2018-04-20
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.3390/info9040100
Information (Switzerland), v. 9, n. 4, 2018.
2078-2489
http://hdl.handle.net/11449/179792
10.3390/info9040100
2-s2.0-85045734307
2-s2.0-85045734307.pdf
url http://dx.doi.org/10.3390/info9040100
http://hdl.handle.net/11449/179792
identifier_str_mv Information (Switzerland), v. 9, n. 4, 2018.
2078-2489
10.3390/info9040100
2-s2.0-85045734307
2-s2.0-85045734307.pdf
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Information (Switzerland)
0,222
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129384753659904