MilkQA: A dataset of consumer questions for the task of answer selection

Criscuolo, Marcelo; Fonseca, Erick Rocha; Aluisio, Sandra Maria; Speranca-Criscuolo, Ana Carolina [UNESP]

MilkQA: A dataset of consumer questions for the task of answer selection

Detalhes bibliográficos
Autor(a) principal:	Criscuolo, Marcelo
Data de Publicação:	2018
Outros Autores:	Fonseca, Erick Rocha, Aluisio, Sandra Maria, Speranca-Criscuolo, Ana Carolina [UNESP]
Tipo de documento:	Artigo de conferência
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1109/BRACIS.2017.12 http://hdl.handle.net/11449/171183
Resumo:	We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.

Metadados do item

id	UNSP_5290107dfab76b0912e7589cbc838023
oai_identifier_str	oai:repositorio.unesp.br:11449/171183
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	MilkQA: A dataset of consumer questions for the task of answer selectionWe introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.University of São Paulo (USP) Institute of Mathematics and Computer SciencesSão Paulo State University (Unesp) College of Letters and SciencesSão Paulo State University (Unesp) College of Letters and SciencesUniversidade de São Paulo (USP)Universidade Estadual Paulista (Unesp)Criscuolo, MarceloFonseca, Erick RochaAluisio, Sandra MariaSperanca-Criscuolo, Ana Carolina [UNESP]2018-12-11T16:54:17Z2018-12-11T16:54:17Z2018-01-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject354-359http://dx.doi.org/10.1109/BRACIS.2017.12Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359.http://hdl.handle.net/11449/17118310.1109/BRACIS.2017.122-s2.0-85049513654Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017info:eu-repo/semantics/openAccess2021-10-23T21:44:37Zoai:repositorio.unesp.br:11449/171183Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:41:44.369964Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	MilkQA: A dataset of consumer questions for the task of answer selection
title	MilkQA: A dataset of consumer questions for the task of answer selection
spellingShingle	MilkQA: A dataset of consumer questions for the task of answer selection Criscuolo, Marcelo
title_short	MilkQA: A dataset of consumer questions for the task of answer selection
title_full	MilkQA: A dataset of consumer questions for the task of answer selection
title_fullStr	MilkQA: A dataset of consumer questions for the task of answer selection
title_full_unstemmed	MilkQA: A dataset of consumer questions for the task of answer selection
title_sort	MilkQA: A dataset of consumer questions for the task of answer selection
author	Criscuolo, Marcelo
author_facet	Criscuolo, Marcelo Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP]
author_role	author
author2	Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP]
author2_role	author author author
dc.contributor.none.fl_str_mv	Universidade de São Paulo (USP) Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	Criscuolo, Marcelo Fonseca, Erick Rocha Aluisio, Sandra Maria Speranca-Criscuolo, Ana Carolina [UNESP]
description	We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: Two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.
publishDate	2018
dc.date.none.fl_str_mv	2018-12-11T16:54:17Z 2018-12-11T16:54:17Z 2018-01-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/conferenceObject
format	conferenceObject
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1109/BRACIS.2017.12 Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359. http://hdl.handle.net/11449/171183 10.1109/BRACIS.2017.12 2-s2.0-85049513654
url	http://dx.doi.org/10.1109/BRACIS.2017.12 http://hdl.handle.net/11449/171183
identifier_str_mv	Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, v. 2018-January, p. 354-359. 10.1109/BRACIS.2017.12 2-s2.0-85049513654
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	354-359
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808129452233719808

MilkQA: A dataset of consumer questions for the task of answer selection

Registros relacionados