NewsMQA: A Multimodal Question Answering Benchmark over News Pieces

Lopes, Carolina Magalhães

NewsMQA: A Multimodal Question Answering Benchmark over News Pieces

Detalhes bibliográficos
Autor(a) principal:	Lopes, Carolina Magalhães
Data de Publicação:	2023
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10362/163255
Resumo:	News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.

Metadados do item

id	RCAP_cdd066090da0c9a9904aeb904ed54320
oai_identifier_str	oai:run.unl.pt:10362/163255
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	NewsMQA: A Multimodal Question Answering Benchmark over News PiecesNews MediaDataset CreationTransformersMultimodal Question AnsweringNatural Language ProcessingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaNews articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.Artigos de notícias são uma das principais fontes de informação disponíveis para pessoas em todo o mundo. São documentos que cobrem uma ampla gama de tópicos e que referem várias entidades, relacionamentos e conceitos que abrangem uma diversidade de domínios. As ilustrações que acompanham estes artigos desempenham um papel crucial em capturar a atenção dos leitores e contribuem para o eriqueciemtno da narrativa. Em domínios complexos como as notícias, métodos de resposta a perguntas podem ser bastante eficazes para fornecer informações ao utilizador. Devido à natureza multimodal das noticias, estes sistemas deveriam comtemplar tanto estas fontes visuais como as linguísticas para conseguir responder corretamente. Para criar um sistema assim, é necessário abordar tarefas como processamento de linguagem natural e de visão. Além disso, é preciso responder a perguntas abertas e aprender representações multimodais, correlacionando as imagens e o texto. Os conjunto de dados existentes para notícias são bastante limitados e ocultam a complexidade do problema. Propomos NewsMQA, um conjunto de dados para a tarefa de Resposta a Perguntas Multimodais para Notícias. Este difere dos conjuntos de dados existentes, contemplando as facetas multimodais das notícias e procurando um boa relação qualidade-tamanho. Em relação à última, sugerimos uma abordagem que combina anotação humana com geração sintética de perguntas e respostas. Fornecemos uma analise abrangente do conjunto de dados introduzido, destacando as suas características e desafios que suporta. Para avaliar o nosso conjunto de dados, utilizamos Transformers pré-treinados e propomos estender estes modelos para suportar multimodalidade, incorporando informações extraídas das imagens na sequencia de entrada fornecida a estes. Realizamos um conjunto de análises e estudos com os quais avaliamos e discutimos a complexidade e desafios do conjunto de dados e fornecemos a nossa percepção sobre as melhores informações que podemos usar para enriquecer os modelos e melhorar sua performance na resolução da tarefa formulada.Semedo, DavidRUNLopes, Carolina Magalhães2024-02-08T15:02:48Z2023-052023-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/163255enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:46:24Zoai:run.unl.pt:10362/163255Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:19.863993Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
spellingShingle	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces Lopes, Carolina Magalhães News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_full	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_fullStr	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_full_unstemmed	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
title_sort	NewsMQA: A Multimodal Question Answering Benchmark over News Pieces
author	Lopes, Carolina Magalhães
author_facet	Lopes, Carolina Magalhães
author_role	author
dc.contributor.none.fl_str_mv	Semedo, David RUN
dc.contributor.author.fl_str_mv	Lopes, Carolina Magalhães
dc.subject.por.fl_str_mv	News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	News Media Dataset Creation Transformers Multimodal Question Answering Natural Language Processing Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	News articles are one of the main sources of information available to people all around the world. These are highly diverse documents that cover a wide range of topics and refer to various entities, relationships and concepts spanning through a multitude of domains. The illustrations accompanying these articles play a crucial role in capturing the readers’ attention and adding to the storytelling. In complex domains such as news, question-answering can be an effective method for delivering acute information-seeking. Due to the open-domain and multimodal nature of news, where images complement the textual medium, QA systems are required to dynamically multiplex both visual and linguistic sources to ground answers. To create such a system, we need to combine both natural language and visual information to find an answer. Moreover, the model needs to answer open-ended questions and learn multimodal representations, correlating both images and text elements. In this thesis, we propose NewsMQA, a novel dataset and benchmark for the task of Multimodal Question-answering for News. NewsMQA differs from existing datasets by fully enclosing the multimodal facets of news and improving on the quality vs. scale trade-off. We adopt a two-part approach that combines human annotation with synthetic question-answer generation through answer roundtrip consistency. We comprehensively study the created dataset, highlighting its unique characteristics, features, quality, and the research challenges of the task that it supports. To benchmark the dataset, we leverage pre-trained Transformers and propose different strategies to extend these with visual information extracted from corresponding images. We conduct an extensive evaluation of the intricacies and challenges of the dataset and provide insights regarding the impact of enriching the input of these models with image-related information. Finally, we provide a critical discussion regarding the best performing approaches and discuss the task open challenges.
publishDate	2023
dc.date.none.fl_str_mv	2023-05 2023-05-01T00:00:00Z 2024-02-08T15:02:48Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10362/163255
url	http://hdl.handle.net/10362/163255
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799138173001924608

NewsMQA: A Multimodal Question Answering Benchmark over News Pieces

Registros relacionados