NewsMQA: A Multimodal Question Answering Benchmark over News Pieces

Detalhes bibliográficos
Autor(a) principal: Lopes, Carolina Magalhães
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/163255
Resumo: News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.
id RCAP_cdd066090da0c9a9904aeb904ed54320
oai_identifier_str oai:run.unl.pt:10362/163255
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling NewsMQA: A Multimodal Question Answering Benchmark over News PiecesNews MediaDataset CreationTransformersMultimodal Question AnsweringNatural Language ProcessingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNews articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.Artigos de notícias são uma das principais fontes de informação disponíveis para pessoas em todo o mundo. São documentos que cobrem uma ampla gama de tópicos e que referem várias entidades, relacionamentos e conceitos que abrangem uma diversidade de domínios. As ilustrações que acompanham estes artigos desempenham um papel crucial em capturar a atenção dos leitores e contribuem para o eriqueciemtno da narrativa. Em domínios complexos como as notícias, métodos de resposta a perguntas podem ser bastante eficazes para fornecer informações ao utilizador. Devido à natureza multimodal das noticias, estes sistemas deveriam comtemplar tanto estas fontes visuais como as linguísticas para conseguir responder corretamente. Para criar um sistema assim, é necessário abordar tarefas como processamento de linguagem natural e de visão. Além disso, é preciso responder a perguntas abertas e aprender representações multimodais, correlacionando as imagens e o texto. Os conjunto de dados existentes para notícias são bastante limitados e ocultam a complexidade do problema. Propomos NewsMQA, um conjunto de dados para a tarefa de Resposta a Perguntas Multimodais para Notícias. Este difere dos conjuntos de dados existentes, contemplando as facetas multimodais das notícias e procurando um boa relação qualidade-tamanho. Em relação à última, sugerimos uma abordagem que combina anotação humana com geração sintética de perguntas e respostas. Fornecemos uma analise abrangente do conjunto de dados introduzido, destacando as suas características e desafios que suporta. Para avaliar o nosso conjunto de dados, utilizamos Transformers pré-treinados e propomos estender estes modelos para suportar multimodalidade, incorporando informações extraídas das imagens na sequencia de entrada fornecida a estes. Realizamos um conjunto de análises e estudos com os quais avaliamos e discutimos a complexidade e desafios do conjunto de dados e fornecemos a nossa percepção sobre as melhores informações que podemos usar para enriquecer os modelos e melhorar sua performance na resolução da tarefa formulada.Semedo, DavidRUNLopes, Carolina Magalhães2024-02-08T15:02:48Z2023-052023-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/163255enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:46:24Zoai:run.unl.pt:10362/163255Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:19.863993Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
spellingShingle NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
Lopes, Carolina Magalhães
News Media
Dataset Creation
Transformers
Multimodal Question Answering
Natural Language Processing
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_full NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_fullStr NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_full_unstemmed NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_sort NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
author Lopes, Carolina Magalhães
author_facet Lopes, Carolina Magalhães
author_role author
dc.contributor.none.fl_str_mv Semedo, David
RUN
dc.contributor.author.fl_str_mv Lopes, Carolina Magalhães
dc.subject.por.fl_str_mv News Media
Dataset Creation
Transformers
Multimodal Question Answering
Natural Language Processing
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic News Media
Dataset Creation
Transformers
Multimodal Question Answering
Natural Language Processing
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.
publishDate 2023
dc.date.none.fl_str_mv 2023-05
2023-05-01T00:00:00Z
2024-02-08T15:02:48Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/163255
url http://hdl.handle.net/10362/163255
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138173001924608