Development of a Corpus for User­based Scientific Question Answering

Detalhes bibliográficos
Autor(a) principal: Vieira, Miguel Ângelo Conde
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/49387
Resumo: Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021
id RCAP_071451bae8393dc98b9c2fb4c5c2b5da
oai_identifier_str oai:repositorio.ul.pt:10451/49387
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Development of a Corpus for User­based Scientific Question AnsweringLiteratura BiomédicaQuestion & AnsweringCorpo de textoAprendizagem ProfundaTeses de mestrado - 2021Departamento de Biologia AnimalTese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021In recent years Question & Answering (QA) tasks became particularly relevant in the research field of natural language understanding. However, the lack of good quality datasets has been an important limiting factor in the quest for better models. Particularly in the biomedical domain, the scarcity of gold standard labelled datasets has been a recognized obstacle given its idiosyncrasies and complexities often require the participation of skilled domain¬specific experts in producing such datasets. To address this issue, a method for automatically gather Question¬Answer pairs from online QA biomedical forums has been suggested yielding a corpus named BiQA. The authors describe several strategies to validate this new dataset but a human manual verification has not been conducted. With this in mind, this dissertation was set out with the objectives of performing a manual verification of a sample of 1200 questions of BiQA and also to expanding these questions, by adding features, into a new corpus of text ¬ BiQA2 ¬ with the goal of contributing with a new corpusfor biomedical QA research. Regarding the manual verification of BiQA, a methodology for its characterization was laid out and allowed the identification of an array of potential problems related to the nature of its questions and answers aptness for which possible improvement solutions were presented. Concomitantly, the proposed new BiQA2 corpus ¬ created upon the validated questions and answers from the perused samples from BiQA ¬ builds new features similar to those observed in other biomedical corpus such as the BioASQ dataset. Both BiQA and BiQA2 were applied to deep learning strategies previously submitted to the BioASQ competition to assess their performance as a source of training data. Although the results achieved with the models created using BiQA2 exhibit limited capability pertaining to the BioASQ challenge, they also show some potential to contribute positively to model training in tasks such as Document re-ranking and answering to ‘yes/no’ questions.Couto, Francisco José MoreiraLamúrias, André Francisco Martins,1990-Repositório da Universidade de LisboaVieira, Miguel Ângelo Conde2021-09-01T14:10:12Z202120212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/49387TID:202934276enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:53:10Zoai:repositorio.ul.pt:10451/49387Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:01:02.937936Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Development of a Corpus for User­based Scientific Question Answering
title Development of a Corpus for User­based Scientific Question Answering
spellingShingle Development of a Corpus for User­based Scientific Question Answering
Vieira, Miguel Ângelo Conde
Literatura Biomédica
Question & Answering
Corpo de texto
Aprendizagem Profunda
Teses de mestrado - 2021
Departamento de Biologia Animal
title_short Development of a Corpus for User­based Scientific Question Answering
title_full Development of a Corpus for User­based Scientific Question Answering
title_fullStr Development of a Corpus for User­based Scientific Question Answering
title_full_unstemmed Development of a Corpus for User­based Scientific Question Answering
title_sort Development of a Corpus for User­based Scientific Question Answering
author Vieira, Miguel Ângelo Conde
author_facet Vieira, Miguel Ângelo Conde
author_role author
dc.contributor.none.fl_str_mv Couto, Francisco José Moreira
Lamúrias, André Francisco Martins,1990-
Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Vieira, Miguel Ângelo Conde
dc.subject.por.fl_str_mv Literatura Biomédica
Question & Answering
Corpo de texto
Aprendizagem Profunda
Teses de mestrado - 2021
Departamento de Biologia Animal
topic Literatura Biomédica
Question & Answering
Corpo de texto
Aprendizagem Profunda
Teses de mestrado - 2021
Departamento de Biologia Animal
description Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021
publishDate 2021
dc.date.none.fl_str_mv 2021-09-01T14:10:12Z
2021
2021
2021-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/49387
TID:202934276
url http://hdl.handle.net/10451/49387
identifier_str_mv TID:202934276
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134558787993600