Analysis of document pre-processing effects in text and opinion mining
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.3390/info9040100 http://hdl.handle.net/11449/179792 |
Resumo: | Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation). |
id |
UNSP_1c3103f22767631b3e0a702fa4e09270 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/179792 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Analysis of document pre-processing effects in text and opinion miningDocument pre-processingDocument similarityMultidimensional projectionOpinion miningSentiment analysisTextminingVisualizationTypically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Departamento de Matematica e Computação Sao Paulo State University-UNESPDepartamento de Informática University of Technology-UTFPRDepartamento de Matematica e Computação Sao Paulo State University-UNESPFAPESP: 2013/03452-0Universidade Estadual Paulista (Unesp)University of Technology-UTFPREler, Danilo Medeiros [UNESP]Grosa, Denilson [UNESP]Pola, IvesGarcia, Rogério [UNESP]Correia, Ronaldo [UNESP]Teixeira, Jaqueline [UNESP]2018-12-11T17:36:46Z2018-12-11T17:36:46Z2018-04-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.3390/info9040100Information (Switzerland), v. 9, n. 4, 2018.2078-2489http://hdl.handle.net/11449/17979210.3390/info90401002-s2.0-850457343072-s2.0-85045734307.pdfScopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengInformation (Switzerland)0,222info:eu-repo/semantics/openAccess2024-06-19T14:32:05Zoai:repositorio.unesp.br:11449/179792Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:01:39.161442Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Analysis of document pre-processing effects in text and opinion mining |
title |
Analysis of document pre-processing effects in text and opinion mining |
spellingShingle |
Analysis of document pre-processing effects in text and opinion mining Eler, Danilo Medeiros [UNESP] Document pre-processing Document similarity Multidimensional projection Opinion mining Sentiment analysis Textmining Visualization |
title_short |
Analysis of document pre-processing effects in text and opinion mining |
title_full |
Analysis of document pre-processing effects in text and opinion mining |
title_fullStr |
Analysis of document pre-processing effects in text and opinion mining |
title_full_unstemmed |
Analysis of document pre-processing effects in text and opinion mining |
title_sort |
Analysis of document pre-processing effects in text and opinion mining |
author |
Eler, Danilo Medeiros [UNESP] |
author_facet |
Eler, Danilo Medeiros [UNESP] Grosa, Denilson [UNESP] Pola, Ives Garcia, Rogério [UNESP] Correia, Ronaldo [UNESP] Teixeira, Jaqueline [UNESP] |
author_role |
author |
author2 |
Grosa, Denilson [UNESP] Pola, Ives Garcia, Rogério [UNESP] Correia, Ronaldo [UNESP] Teixeira, Jaqueline [UNESP] |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (Unesp) University of Technology-UTFPR |
dc.contributor.author.fl_str_mv |
Eler, Danilo Medeiros [UNESP] Grosa, Denilson [UNESP] Pola, Ives Garcia, Rogério [UNESP] Correia, Ronaldo [UNESP] Teixeira, Jaqueline [UNESP] |
dc.subject.por.fl_str_mv |
Document pre-processing Document similarity Multidimensional projection Opinion mining Sentiment analysis Textmining Visualization |
topic |
Document pre-processing Document similarity Multidimensional projection Opinion mining Sentiment analysis Textmining Visualization |
description |
Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation). |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-11T17:36:46Z 2018-12-11T17:36:46Z 2018-04-20 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.3390/info9040100 Information (Switzerland), v. 9, n. 4, 2018. 2078-2489 http://hdl.handle.net/11449/179792 10.3390/info9040100 2-s2.0-85045734307 2-s2.0-85045734307.pdf |
url |
http://dx.doi.org/10.3390/info9040100 http://hdl.handle.net/11449/179792 |
identifier_str_mv |
Information (Switzerland), v. 9, n. 4, 2018. 2078-2489 10.3390/info9040100 2-s2.0-85045734307 2-s2.0-85045734307.pdf |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Information (Switzerland) 0,222 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129384753659904 |