In search of reputation assessment: experiences with polarity classification in RepLab 2013

Detalhes bibliográficos
Autor(a) principal: Saias, José
Data de Publicação: 2013
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/10352
Resumo: The diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure.
id RCAP_610a5c8de81c5f42c210249151b9e218
oai_identifier_str oai:dspace.uevora.pt:10174/10352
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling In search of reputation assessment: experiences with polarity classification in RepLab 2013opinion miningreputation assessmentNLPMachine LearningThe diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure.clef2013.org2014-01-29T18:38:44Z2014-01-292013-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/10352http://hdl.handle.net/10174/10352engJosé Saias. In search of reputation assessment: Experiences with polarity classification in replab 2013. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors, CLEF 2013 Evaluation Labs and Workshop Online Working Notes - Online Reputation Management (RepLab), Valencia, Spain, September 2013.978-88-904810-5-5http://www.clef-initiative.eu/documents/71612/10fcd949-e5f0-4f00-8e01-cbd2a213e147jsaias@uevora.pt283Saias, Joséinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:53:03Zoai:dspace.uevora.pt:10174/10352Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:04:13.620487Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv In search of reputation assessment: experiences with polarity classification in RepLab 2013
title In search of reputation assessment: experiences with polarity classification in RepLab 2013
spellingShingle In search of reputation assessment: experiences with polarity classification in RepLab 2013
Saias, José
opinion mining
reputation assessment
NLP
Machine Learning
title_short In search of reputation assessment: experiences with polarity classification in RepLab 2013
title_full In search of reputation assessment: experiences with polarity classification in RepLab 2013
title_fullStr In search of reputation assessment: experiences with polarity classification in RepLab 2013
title_full_unstemmed In search of reputation assessment: experiences with polarity classification in RepLab 2013
title_sort In search of reputation assessment: experiences with polarity classification in RepLab 2013
author Saias, José
author_facet Saias, José
author_role author
dc.contributor.author.fl_str_mv Saias, José
dc.subject.por.fl_str_mv opinion mining
reputation assessment
NLP
Machine Learning
topic opinion mining
reputation assessment
NLP
Machine Learning
description The diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure.
publishDate 2013
dc.date.none.fl_str_mv 2013-09-01T00:00:00Z
2014-01-29T18:38:44Z
2014-01-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/10352
http://hdl.handle.net/10174/10352
url http://hdl.handle.net/10174/10352
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv José Saias. In search of reputation assessment: Experiences with polarity classification in replab 2013. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors, CLEF 2013 Evaluation Labs and Workshop Online Working Notes - Online Reputation Management (RepLab), Valencia, Spain, September 2013.
978-88-904810-5-5
http://www.clef-initiative.eu/documents/71612/10fcd949-e5f0-4f00-8e01-cbd2a213e147
jsaias@uevora.pt
283
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv clef2013.org
publisher.none.fl_str_mv clef2013.org
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136526373748736