Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016

Detalhes bibliográficos
Autor(a) principal: Bayot, Roy
Data de Publicação: 2016
Outros Autores: Gonçalves, Teresa
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/20667
Resumo: In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.
id RCAP_217ae1b8ee583fd710065c94aa42973f
oai_identifier_str oai:dspace.uevora.pt:10174/20667
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.Erasmus Mundus EMMA-WEST projectCEUR2017-02-06T12:07:10Z2017-02-062016-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/20667http://hdl.handle.net/10174/20667engRoy Bayot and Teresa Gonçalves. Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016. In Krisztian Balog, Linda Cappellato, Nicola Ferro, and Craig Macdonald, editors, Working Notes of CLEF’2016 – Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016., volume 1609, pages 815–823, Évora, PT, September 2016. CEUR.ndnd498Bayot, RoyGonçalves, Teresainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:10:37Zoai:dspace.uevora.pt:10174/20667Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:12:02.718601Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
title Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
spellingShingle Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
Bayot, Roy
title_short Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
title_full Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
title_fullStr Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
title_full_unstemmed Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
title_sort Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016
author Bayot, Roy
author_facet Bayot, Roy
Gonçalves, Teresa
author_role author
author2 Gonçalves, Teresa
author2_role author
dc.contributor.author.fl_str_mv Bayot, Roy
Gonçalves, Teresa
description In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.
publishDate 2016
dc.date.none.fl_str_mv 2016-09-01T00:00:00Z
2017-02-06T12:07:10Z
2017-02-06
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/20667
http://hdl.handle.net/10174/20667
url http://hdl.handle.net/10174/20667
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Roy Bayot and Teresa Gonçalves. Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016. In Krisztian Balog, Linda Cappellato, Nicola Ferro, and Craig Macdonald, editors, Working Notes of CLEF’2016 – Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016., volume 1609, pages 815–823, Évora, PT, September 2016. CEUR.
nd
nd
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv CEUR
publisher.none.fl_str_mv CEUR
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136602468909056