MilkQA: A dataset of consumer questions for the task of answer selection
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1109/BRACIS.2017.12 http://hdl.handle.net/11449/171183 |
Resumo: | We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements. |
id |
UNSP_5290107dfab76b0912e7589cbc838023 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/171183 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
MilkQA: A dataset of consumer questions for the task of answer selectionWe introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.University of São Paulo (USP) Institute of Mathematics and Computer SciencesSão Paulo State University (Unesp) College of Letters and SciencesSão Paulo State University (Unesp) College of Letters and SciencesUniversidade de São Paulo (USP)Universidade Estadual Paulista (Unesp)Criscuolo, MarceloFonseca, Erick RochaAluisio, Sandra MariaSperanca-Criscuolo, Ana Carolina [UNESP]2018-12-11T16:54:17Z2018-12-11T16:54:17Z2018-01-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject354-359http://dx.doi.org/10.1109/BRACIS.2017.12Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359.http://hdl.handle.net/11449/17118310.1109/BRACIS.2017.122-s2.0-85049513654Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017info:eu-repo/semantics/openAccess2021-10-23T21:44:37Zoai:repositorio.unesp.br:11449/171183Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:41:44.369964Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
MilkQA: A dataset of consumer questions for the task of answer selection |
title |
MilkQA: A dataset of consumer questions for the task of answer selection |
spellingShingle |
MilkQA: A dataset of consumer questions for the task of answer selection Criscuolo, Marcelo |
title_short |
MilkQA: A dataset of consumer questions for the task of answer selection |
title_full |
MilkQA: A dataset of consumer questions for the task of answer selection |
title_fullStr |
MilkQA: A dataset of consumer questions for the task of answer selection |
title_full_unstemmed |
MilkQA: A dataset of consumer questions for the task of answer selection |
title_sort |
MilkQA: A dataset of consumer questions for the task of answer selection |
author |
Criscuolo, Marcelo |
author_facet |
Criscuolo, Marcelo Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP] |
author_role |
author |
author2 |
Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP] |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Universidade de São Paulo (USP) Universidade Estadual Paulista (Unesp) |
dc.contributor.author.fl_str_mv |
Criscuolo, Marcelo Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP] |
description |
We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-11T16:54:17Z 2018-12-11T16:54:17Z 2018-01-04 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1109/BRACIS.2017.12 Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359. http://hdl.handle.net/11449/171183 10.1109/BRACIS.2017.12 2-s2.0-85049513654 |
url |
http://dx.doi.org/10.1109/BRACIS.2017.12 http://hdl.handle.net/11449/171183 |
identifier_str_mv |
Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359. 10.1109/BRACIS.2017.12 2-s2.0-85049513654 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
354-359 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129452233719808 |