Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/20667 |
Resumo: | In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification. |
id |
RCAP_217ae1b8ee583fd710065c94aa42973f |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/20667 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.Erasmus Mundus EMMA-WEST projectCEUR2017-02-06T12:07:10Z2017-02-062016-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/20667http://hdl.handle.net/10174/20667engRoy Bayot and Teresa Gonçalves. Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016. In Krisztian Balog, Linda Cappellato, Nicola Ferro, and Craig Macdonald, editors, Working Notes of CLEF’2016 – Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016., volume 1609, pages 815–823, Évora, PT, September 2016. CEUR.ndnd498Bayot, RoyGonçalves, Teresainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:10:37Zoai:dspace.uevora.pt:10174/20667Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:12:02.718601Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
title |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
spellingShingle |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 Bayot, Roy |
title_short |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
title_full |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
title_fullStr |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
title_full_unstemmed |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
title_sort |
Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016 |
author |
Bayot, Roy |
author_facet |
Bayot, Roy Gonçalves, Teresa |
author_role |
author |
author2 |
Gonçalves, Teresa |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Bayot, Roy Gonçalves, Teresa |
description |
In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-09-01T00:00:00Z 2017-02-06T12:07:10Z 2017-02-06 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/20667 http://hdl.handle.net/10174/20667 |
url |
http://hdl.handle.net/10174/20667 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Roy Bayot and Teresa Gonçalves. Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016. In Krisztian Balog, Linda Cappellato, Nicola Ferro, and Craig Macdonald, editors, Working Notes of CLEF’2016 – Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016., volume 1609, pages 815–823, Évora, PT, September 2016. CEUR. nd nd 498 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
CEUR |
publisher.none.fl_str_mv |
CEUR |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136602468909056 |