Learning Supervised Topic Models for Classification and Regression from Crowds
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10316/44319 https://doi.org/10.1109/TPAMI.2017.2648786 |
Resumo: | The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches. |
id |
RCAP_406c849b1ea24d69437ef4e09f179634 |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/44319 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Learning Supervised Topic Models for Classification and Regression from CrowdsThe growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.IEEE2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/44319http://hdl.handle.net/10316/44319https://doi.org/10.1109/TPAMI.2017.2648786https://doi.org/10.1109/TPAMI.2017.2648786engRodrigues, FilipeLourenco, MarianaRibeiro, BernardetePereira, Franciscoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2021-06-29T10:03:16Zoai:estudogeral.uc.pt:10316/44319Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:58:16.883295Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Learning Supervised Topic Models for Classification and Regression from Crowds |
title |
Learning Supervised Topic Models for Classification and Regression from Crowds |
spellingShingle |
Learning Supervised Topic Models for Classification and Regression from Crowds Rodrigues, Filipe |
title_short |
Learning Supervised Topic Models for Classification and Regression from Crowds |
title_full |
Learning Supervised Topic Models for Classification and Regression from Crowds |
title_fullStr |
Learning Supervised Topic Models for Classification and Regression from Crowds |
title_full_unstemmed |
Learning Supervised Topic Models for Classification and Regression from Crowds |
title_sort |
Learning Supervised Topic Models for Classification and Regression from Crowds |
author |
Rodrigues, Filipe |
author_facet |
Rodrigues, Filipe Lourenco, Mariana Ribeiro, Bernardete Pereira, Francisco |
author_role |
author |
author2 |
Lourenco, Mariana Ribeiro, Bernardete Pereira, Francisco |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Rodrigues, Filipe Lourenco, Mariana Ribeiro, Bernardete Pereira, Francisco |
description |
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10316/44319 http://hdl.handle.net/10316/44319 https://doi.org/10.1109/TPAMI.2017.2648786 https://doi.org/10.1109/TPAMI.2017.2648786 |
url |
http://hdl.handle.net/10316/44319 https://doi.org/10.1109/TPAMI.2017.2648786 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
IEEE |
publisher.none.fl_str_mv |
IEEE |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133873036066816 |