In search of reputation assessment: experiences with polarity classification in RepLab 2013
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/10352 |
Resumo: | The diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure. |
id |
RCAP_610a5c8de81c5f42c210249151b9e218 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/10352 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
In search of reputation assessment: experiences with polarity classification in RepLab 2013opinion miningreputation assessmentNLPMachine LearningThe diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure.clef2013.org2014-01-29T18:38:44Z2014-01-292013-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/10352http://hdl.handle.net/10174/10352engJosé Saias. In search of reputation assessment: Experiences with polarity classification in replab 2013. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors, CLEF 2013 Evaluation Labs and Workshop Online Working Notes - Online Reputation Management (RepLab), Valencia, Spain, September 2013.978-88-904810-5-5http://www.clef-initiative.eu/documents/71612/10fcd949-e5f0-4f00-8e01-cbd2a213e147jsaias@uevora.pt283Saias, Joséinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:53:03Zoai:dspace.uevora.pt:10174/10352Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:04:13.620487Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
title |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
spellingShingle |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 Saias, José opinion mining reputation assessment NLP Machine Learning |
title_short |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
title_full |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
title_fullStr |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
title_full_unstemmed |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
title_sort |
In search of reputation assessment: experiences with polarity classification in RepLab 2013 |
author |
Saias, José |
author_facet |
Saias, José |
author_role |
author |
dc.contributor.author.fl_str_mv |
Saias, José |
dc.subject.por.fl_str_mv |
opinion mining reputation assessment NLP Machine Learning |
topic |
opinion mining reputation assessment NLP Machine Learning |
description |
The diue system uses a supervised Machine Learning approach for the polarity classification subtask of RepLab. We used the Python NLTK for preprocessing, including file parsing, text analysis and feature extraction. Our best solution is a mixed strategy, combining bag-of-words with a limited set of features based on sentiment lexicons and superficial text analysis. This system begins by applying tokenization and lemmatization. Then each tweet content is analyzed and 18 features are obtained, related to presence of polarized term, negation before polarized expression and entity reference. For the first run, the learning and classification were performed with the Decision Tree algorithm, from the NLTK framework. In the second run, we used a pipeline of classifiers. The first classifier applies Naive Bayes in a bag-of-words feature model, with the 1500 most frequent words in the training set. The second classifier used the features from the first run plus another feature with the result from the previous classifier. Our system's best result had 0.54694 Accuracy and 0.31506 in F measure. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-09-01T00:00:00Z 2014-01-29T18:38:44Z 2014-01-29 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/10352 http://hdl.handle.net/10174/10352 |
url |
http://hdl.handle.net/10174/10352 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
José Saias. In search of reputation assessment: Experiences with polarity classification in replab 2013. In Pamela Forner, Roberto Navigli, and Dan Tufis, editors, CLEF 2013 Evaluation Labs and Workshop Online Working Notes - Online Reputation Management (RepLab), Valencia, Spain, September 2013. 978-88-904810-5-5 http://www.clef-initiative.eu/documents/71612/10fcd949-e5f0-4f00-8e01-cbd2a213e147 jsaias@uevora.pt 283 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
clef2013.org |
publisher.none.fl_str_mv |
clef2013.org |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136526373748736 |