MilkQA: A dataset of consumer questions for the task of answer selection

Detalhes bibliográficos
Autor(a) principal: Criscuolo, Marcelo
Data de Publicação: 2018
Outros Autores: Fonseca, Erick Rocha, Aluisio, Sandra Maria, Speranca-Criscuolo, Ana Carolina [UNESP]
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1109/BRACIS.2017.12
http://hdl.handle.net/11449/171183
Resumo: We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.
id UNSP_5290107dfab76b0912e7589cbc838023
oai_identifier_str oai:repositorio.unesp.br:11449/171183
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling MilkQA: A dataset of consumer questions for the task of answer selectionWe introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.University of São Paulo (USP) Institute of Mathematics and Computer SciencesSão Paulo State University (Unesp) College of Letters and SciencesSão Paulo State University (Unesp) College of Letters and SciencesUniversidade de São Paulo (USP)Universidade Estadual Paulista (Unesp)Criscuolo, MarceloFonseca, Erick RochaAluisio, Sandra MariaSperanca-Criscuolo, Ana Carolina [UNESP]2018-12-11T16:54:17Z2018-12-11T16:54:17Z2018-01-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject354-359http://dx.doi.org/10.1109/BRACIS.2017.12Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359.http://hdl.handle.net/11449/17118310.1109/BRACIS.2017.122-s2.0-85049513654Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017info:eu-repo/semantics/openAccess2021-10-23T21:44:37Zoai:repositorio.unesp.br:11449/171183Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:41:44.369964Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv MilkQA: A dataset of consumer questions for the task of answer selection
title MilkQA: A dataset of consumer questions for the task of answer selection
spellingShingle MilkQA: A dataset of consumer questions for the task of answer selection
Criscuolo, Marcelo
title_short MilkQA: A dataset of consumer questions for the task of answer selection
title_full MilkQA: A dataset of consumer questions for the task of answer selection
title_fullStr MilkQA: A dataset of consumer questions for the task of answer selection
title_full_unstemmed MilkQA: A dataset of consumer questions for the task of answer selection
title_sort MilkQA: A dataset of consumer questions for the task of answer selection
author Criscuolo, Marcelo
author_facet Criscuolo, Marcelo
Fonseca, Erick Rocha
Aluisio, Sandra Maria
Speranca-Criscuolo, Ana Carolina [UNESP]
author_role author
author2 Fonseca, Erick Rocha
Aluisio, Sandra Maria
Speranca-Criscuolo, Ana Carolina [UNESP]
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade de São Paulo (USP)
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Criscuolo, Marcelo
Fonseca, Erick Rocha
Aluisio, Sandra Maria
Speranca-Criscuolo, Ana Carolina [UNESP]
description We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-11T16:54:17Z
2018-12-11T16:54:17Z
2018-01-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1109/BRACIS.2017.12
Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359.
http://hdl.handle.net/11449/171183
10.1109/BRACIS.2017.12
2-s2.0-85049513654
url http://dx.doi.org/10.1109/BRACIS.2017.12
http://hdl.handle.net/11449/171183
identifier_str_mv Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359.
10.1109/BRACIS.2017.12
2-s2.0-85049513654
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 354-359
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129452233719808