Learning from multiple annotators: distinguishing good from random labelers
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10316/27407 https://doi.org/10.1016/j.patrec.2013.05.012 |
Resumo: | With the increasing popularity of online crowdsourcing platforms such as Amazon Mechanical Turk (AMT), building supervised learning models for datasets with multiple annotators is receiving an increasing attention from researchers. These platforms provide an inexpensive and accessible resource that can be used to obtain labeled data, and in many situations the quality of the labels competes directly with those of experts. For such reasons, much attention has recently been given to annotator-aware models. In this paper, we propose a new probabilistic model for supervised learning with multiple annotators where the reliability of the different annotators is treated as a latent variable. We empirically show that this model is able to achieve state of the art performance, while reducing the number of model parameters, thus avoiding a potential overfitting. Furthermore, the proposed model is easier to implement and extend to other classes of learning problems such as sequence labeling tasks. |
id |
RCAP_3a9a30386cabdc89299a164113f89048 |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/27407 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Learning from multiple annotators: distinguishing good from random labelersMultiple annotatorsCrowdsourcingLatent variable modelsExpectation–MaximizationLogistic RegressionWith the increasing popularity of online crowdsourcing platforms such as Amazon Mechanical Turk (AMT), building supervised learning models for datasets with multiple annotators is receiving an increasing attention from researchers. These platforms provide an inexpensive and accessible resource that can be used to obtain labeled data, and in many situations the quality of the labels competes directly with those of experts. For such reasons, much attention has recently been given to annotator-aware models. In this paper, we propose a new probabilistic model for supervised learning with multiple annotators where the reliability of the different annotators is treated as a latent variable. We empirically show that this model is able to achieve state of the art performance, while reducing the number of model parameters, thus avoiding a potential overfitting. Furthermore, the proposed model is easier to implement and extend to other classes of learning problems such as sequence labeling tasks.Elsevier2013-09-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/27407http://hdl.handle.net/10316/27407https://doi.org/10.1016/j.patrec.2013.05.012engRODRIGUES, Filipe; PEREIRA, Francisco; RIBEIRO, Bernardete - Learning from multiple annotators: distinguishing good from random labelers. "Pattern Recognition Letters". ISSN 0167-8655. Vol. 34 Nº. 12 (2013) p. 1428-14360167-8655http://www.sciencedirect.com/science/article/pii/S016786551300202XRodrigues, FilipePereira, FranciscoRibeiro, Bernardeteinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2020-05-25T12:20:33Zoai:estudogeral.uc.pt:10316/27407Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:58:19.102947Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Learning from multiple annotators: distinguishing good from random labelers |
title |
Learning from multiple annotators: distinguishing good from random labelers |
spellingShingle |
Learning from multiple annotators: distinguishing good from random labelers Rodrigues, Filipe Multiple annotators Crowdsourcing Latent variable models Expectation–Maximization Logistic Regression |
title_short |
Learning from multiple annotators: distinguishing good from random labelers |
title_full |
Learning from multiple annotators: distinguishing good from random labelers |
title_fullStr |
Learning from multiple annotators: distinguishing good from random labelers |
title_full_unstemmed |
Learning from multiple annotators: distinguishing good from random labelers |
title_sort |
Learning from multiple annotators: distinguishing good from random labelers |
author |
Rodrigues, Filipe |
author_facet |
Rodrigues, Filipe Pereira, Francisco Ribeiro, Bernardete |
author_role |
author |
author2 |
Pereira, Francisco Ribeiro, Bernardete |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Rodrigues, Filipe Pereira, Francisco Ribeiro, Bernardete |
dc.subject.por.fl_str_mv |
Multiple annotators Crowdsourcing Latent variable models Expectation–Maximization Logistic Regression |
topic |
Multiple annotators Crowdsourcing Latent variable models Expectation–Maximization Logistic Regression |
description |
With the increasing popularity of online crowdsourcing platforms such as Amazon Mechanical Turk (AMT), building supervised learning models for datasets with multiple annotators is receiving an increasing attention from researchers. These platforms provide an inexpensive and accessible resource that can be used to obtain labeled data, and in many situations the quality of the labels competes directly with those of experts. For such reasons, much attention has recently been given to annotator-aware models. In this paper, we propose a new probabilistic model for supervised learning with multiple annotators where the reliability of the different annotators is treated as a latent variable. We empirically show that this model is able to achieve state of the art performance, while reducing the number of model parameters, thus avoiding a potential overfitting. Furthermore, the proposed model is easier to implement and extend to other classes of learning problems such as sequence labeling tasks. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-09-01 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10316/27407 http://hdl.handle.net/10316/27407 https://doi.org/10.1016/j.patrec.2013.05.012 |
url |
http://hdl.handle.net/10316/27407 https://doi.org/10.1016/j.patrec.2013.05.012 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
RODRIGUES, Filipe; PEREIRA, Francisco; RIBEIRO, Bernardete - Learning from multiple annotators: distinguishing good from random labelers. "Pattern Recognition Letters". ISSN 0167-8655. Vol. 34 Nº. 12 (2013) p. 1428-1436 0167-8655 http://www.sciencedirect.com/science/article/pii/S016786551300202X |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133873990270976 |