NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/163255 |
Resumo: | News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges. |
id |
RCAP_cdd066090da0c9a9904aeb904ed54320 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/163255 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
NewsMQA: A Multimodal Question Answering Benchmark over News PiecesNews MediaDataset CreationTransformersMultimodal Question AnsweringNatural Language ProcessingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNews articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.Artigos de notícias são uma das principais fontes de informação disponíveis para pessoas em todo o mundo. São documentos que cobrem uma ampla gama de tópicos e que referem várias entidades, relacionamentos e conceitos que abrangem uma diversidade de domínios. As ilustrações que acompanham estes artigos desempenham um papel crucial em capturar a atenção dos leitores e contribuem para o eriqueciemtno da narrativa. Em domínios complexos como as notícias, métodos de resposta a perguntas podem ser bastante eficazes para fornecer informações ao utilizador. Devido à natureza multimodal das noticias, estes sistemas deveriam comtemplar tanto estas fontes visuais como as linguísticas para conseguir responder corretamente. Para criar um sistema assim, é necessário abordar tarefas como processamento de linguagem natural e de visão. Além disso, é preciso responder a perguntas abertas e aprender representações multimodais, correlacionando as imagens e o texto. Os conjunto de dados existentes para notícias são bastante limitados e ocultam a complexidade do problema. Propomos NewsMQA, um conjunto de dados para a tarefa de Resposta a Perguntas Multimodais para Notícias. Este difere dos conjuntos de dados existentes, contemplando as facetas multimodais das notícias e procurando um boa relação qualidade-tamanho. Em relação à última, sugerimos uma abordagem que combina anotação humana com geração sintética de perguntas e respostas. Fornecemos uma analise abrangente do conjunto de dados introduzido, destacando as suas características e desafios que suporta. Para avaliar o nosso conjunto de dados, utilizamos Transformers pré-treinados e propomos estender estes modelos para suportar multimodalidade, incorporando informações extraídas das imagens na sequencia de entrada fornecida a estes. Realizamos um conjunto de análises e estudos com os quais avaliamos e discutimos a complexidade e desafios do conjunto de dados e fornecemos a nossa percepção sobre as melhores informações que podemos usar para enriquecer os modelos e melhorar sua performance na resolução da tarefa formulada.Semedo, DavidRUNLopes, Carolina Magalhães2024-02-08T15:02:48Z2023-052023-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/163255enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:46:24Zoai:run.unl.pt:10362/163255Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:19.863993Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
title |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
spellingShingle |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces Lopes, Carolina Magalhães News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
title_full |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
title_fullStr |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
title_full_unstemmed |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
title_sort |
NewsMQA: A Multimodal Question Answering Benchmark over News Pieces |
author |
Lopes, Carolina Magalhães |
author_facet |
Lopes, Carolina Magalhães |
author_role |
author |
dc.contributor.none.fl_str_mv |
Semedo, David RUN |
dc.contributor.author.fl_str_mv |
Lopes, Carolina Magalhães |
dc.subject.por.fl_str_mv |
News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-05 2023-05-01T00:00:00Z 2024-02-08T15:02:48Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/163255 |
url |
http://hdl.handle.net/10362/163255 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138173001924608 |