Development of a Corpus for Userbased Scientific Question Answering
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/49387 |
Resumo: | Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021 |
id |
RCAP_071451bae8393dc98b9c2fb4c5c2b5da |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/49387 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Development of a Corpus for Userbased Scientific Question AnsweringLiteratura BiomédicaQuestion & AnsweringCorpo de textoAprendizagem ProfundaTeses de mestrado - 2021Departamento de Biologia AnimalTese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021In recent years Question & Answering (QA) tasks became particularly relevant in the research field of natural language understanding. However, the lack of good quality datasets has been an important limiting factor in the quest for better models. Particularly in the biomedical domain, the scarcity of gold standard labelled datasets has been a recognized obstacle given its idiosyncrasies and complexities often require the participation of skilled domain¬specific experts in producing such datasets. To address this issue, a method for automatically gather Question¬Answer pairs from online QA biomedical forums has been suggested yielding a corpus named BiQA. The authors describe several strategies to validate this new dataset but a human manual verification has not been conducted. With this in mind, this dissertation was set out with the objectives of performing a manual verification of a sample of 1200 questions of BiQA and also to expanding these questions, by adding features, into a new corpus of text ¬ BiQA2 ¬ with the goal of contributing with a new corpusfor biomedical QA research. Regarding the manual verification of BiQA, a methodology for its characterization was laid out and allowed the identification of an array of potential problems related to the nature of its questions and answers aptness for which possible improvement solutions were presented. Concomitantly, the proposed new BiQA2 corpus ¬ created upon the validated questions and answers from the perused samples from BiQA ¬ builds new features similar to those observed in other biomedical corpus such as the BioASQ dataset. Both BiQA and BiQA2 were applied to deep learning strategies previously submitted to the BioASQ competition to assess their performance as a source of training data. Although the results achieved with the models created using BiQA2 exhibit limited capability pertaining to the BioASQ challenge, they also show some potential to contribute positively to model training in tasks such as Document re-ranking and answering to ‘yes/no’ questions.Couto, Francisco José MoreiraLamúrias, André Francisco Martins,1990-Repositório da Universidade de LisboaVieira, Miguel Ângelo Conde2021-09-01T14:10:12Z202120212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/49387TID:202934276enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:53:10Zoai:repositorio.ul.pt:10451/49387Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:01:02.937936Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Development of a Corpus for Userbased Scientific Question Answering |
title |
Development of a Corpus for Userbased Scientific Question Answering |
spellingShingle |
Development of a Corpus for Userbased Scientific Question Answering Vieira, Miguel Ângelo Conde Literatura Biomédica Question & Answering Corpo de texto Aprendizagem Profunda Teses de mestrado - 2021 Departamento de Biologia Animal |
title_short |
Development of a Corpus for Userbased Scientific Question Answering |
title_full |
Development of a Corpus for Userbased Scientific Question Answering |
title_fullStr |
Development of a Corpus for Userbased Scientific Question Answering |
title_full_unstemmed |
Development of a Corpus for Userbased Scientific Question Answering |
title_sort |
Development of a Corpus for Userbased Scientific Question Answering |
author |
Vieira, Miguel Ângelo Conde |
author_facet |
Vieira, Miguel Ângelo Conde |
author_role |
author |
dc.contributor.none.fl_str_mv |
Couto, Francisco José Moreira Lamúrias, André Francisco Martins,1990- Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Vieira, Miguel Ângelo Conde |
dc.subject.por.fl_str_mv |
Literatura Biomédica Question & Answering Corpo de texto Aprendizagem Profunda Teses de mestrado - 2021 Departamento de Biologia Animal |
topic |
Literatura Biomédica Question & Answering Corpo de texto Aprendizagem Profunda Teses de mestrado - 2021 Departamento de Biologia Animal |
description |
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021 |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-09-01T14:10:12Z 2021 2021 2021-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/49387 TID:202934276 |
url |
http://hdl.handle.net/10451/49387 |
identifier_str_mv |
TID:202934276 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134558787993600 |