An investigation of linguistic problems in automatic multi-document summaries

Dias, Márcio de Souza; Di Felippo, Ariani; Rassi, Amanda Pontes; Cardoso, Paula Christina Figueira; Nóbrega, Fernando Antônio Asevedo; Pardo, Thiago Alexandre Salgueiro

An investigation of linguistic problems in automatic multi-document summaries

Detalhes bibliográficos
Autor(a) principal:	Dias, Márcio de Souza
Data de Publicação:	2021
Outros Autores:	Di Felippo, Ariani, Rassi, Amanda Pontes, Cardoso, Paula Christina Figueira, Nóbrega, Fernando Antônio Asevedo, Pardo, Thiago Alexandre Salgueiro
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UFLA
Texto Completo:	http://repositorio.ufla.br/jspui/handle/1/50347
Resumo:	Automatic summaries commonly present diverse linguistic problems that affect textual quality and thus their understanding by users. Few studies have tried to characterize such problems and their relation with the performance of the summarization systems. In this paper, we investigated the problems in multi-document extracts (i.e., summaries produced by concatenating several sentences taken exactly as they appear in the source texts) generated by systems for Brazilian Portuguese that have different approaches (i.e., superficial and deep) and performances (i.e., baseline and state-of-the art methods). For that, we first reviewed the main characterization studies, resulting in a typology of linguistic problems more suitable for multi-document summarization. Then, we manually annotated a corpus of automatic multi-document extracts in Portuguese based on the typology, which showed that some of linguistic problems are significantly more recurrent than others. Thus, this corpus annotation may support research on linguistic problems detection and correction for summary improvement, allowing the production of automatic summaries that are not only informative (i.e., they convey the content of the source material), but also linguistically well structured.

Metadados do item

id	UFLA_64be6458626eee6da9d56d1af9cdd4b4
oai_identifier_str	oai:localhost:1/50347
network_acronym_str	UFLA
network_name_str	Repositório Institucional da UFLA
repository_id_str
spelling	An investigation of linguistic problems in automatic multi-document summariesUma investigação de problemas linguísticos em sumários automáticos multidocumentoAutomatic summarizationMulti-document summaryLinguistic problemCorpus annotationSumarização automáticaSumário multidocumentoProblema linguísticoAnotação de corpusAutomatic summaries commonly present diverse linguistic problems that affect textual quality and thus their understanding by users. Few studies have tried to characterize such problems and their relation with the performance of the summarization systems. In this paper, we investigated the problems in multi-document extracts (i.e., summaries produced by concatenating several sentences taken exactly as they appear in the source texts) generated by systems for Brazilian Portuguese that have different approaches (i.e., superficial and deep) and performances (i.e., baseline and state-of-the art methods). For that, we first reviewed the main characterization studies, resulting in a typology of linguistic problems more suitable for multi-document summarization. Then, we manually annotated a corpus of automatic multi-document extracts in Portuguese based on the typology, which showed that some of linguistic problems are significantly more recurrent than others. Thus, this corpus annotation may support research on linguistic problems detection and correction for summary improvement, allowing the production of automatic summaries that are not only informative (i.e., they convey the content of the source material), but also linguistically well structured.Sumários automáticos geralmente apresentam vários problemas linguísticos que afetam a sua qualidade textual e, consequentemente, sua compreensão pelos usuários. Alguns trabalhos caracterizam tais problemas e os relacionam ao desempenho dos sistemas de sumarização. Neste artigo, investigaram-se os problemas em extratos (isto é, sumários produzidos pela concatenação de sentenças extraídas na íntegra dos textos-fonte) multidocumento em Português do Brasil gerados por sistemas que apresentam diferentes abordagens (isto é, superficial e profunda) e desempenho (isto é, métodos baseline e do estado-da-arte). Para tanto, as principais caracterizações dos problemas linguísticos em sumários automáticos foram investigadas, resultando em uma tipologia mais adequada à sumarização multidocumento. Em seguida, anotou-se manualmente um corpus de extratos com base na tipologia, evidenciando que alguns tipos de problemas são significativamente mais recorrentes que outros. Assim, essa anotação gera subsídios para as tarefas automáticas de detecção e correção de problemas linguísticos com vistas à produção de sumários automáticos não só mais informativos (isto é, que cobrem o conteúdo do material de origem), como também linguisticamente bem-estruturados.Universidade Federal de Minas Gerais (UFMG), Faculdade de Letras (FALE)2022-06-27T12:44:35Z2022-06-27T12:44:35Z2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfDIAS, M. de S. et al. An investigation of linguistic problems in automatic multi-document summaries. Revista de Estudos da Linguagem, Belo Horizonte, v. 29, n. 2, p. 859-907, 2021. DOI: 10.17851/2237-2083.29.2.859-907.http://repositorio.ufla.br/jspui/handle/1/50347Revista de Estudos da Linguagemreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessDias, Márcio de SouzaDi Felippo, ArianiRassi, Amanda PontesCardoso, Paula Christina FigueiraNóbrega, Fernando Antônio AsevedoPardo, Thiago Alexandre Salgueiroeng2023-05-03T13:18:54Zoai:localhost:1/50347Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br \|\| repositorio.biblioteca@ufla.bropendoar:2023-05-03T13:18:54Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv	An investigation of linguistic problems in automatic multi-document summaries Uma investigação de problemas linguísticos em sumários automáticos multidocumento
title	An investigation of linguistic problems in automatic multi-document summaries
spellingShingle	An investigation of linguistic problems in automatic multi-document summaries Dias, Márcio de Souza Automatic summarization Multi-document summary Linguistic problem Corpus annotation Sumarização automática Sumário multidocumento Problema linguístico Anotação de corpus
title_short	An investigation of linguistic problems in automatic multi-document summaries
title_full	An investigation of linguistic problems in automatic multi-document summaries
title_fullStr	An investigation of linguistic problems in automatic multi-document summaries
title_full_unstemmed	An investigation of linguistic problems in automatic multi-document summaries
title_sort	An investigation of linguistic problems in automatic multi-document summaries
author	Dias, Márcio de Souza
author_facet	Dias, Márcio de Souza Di Felippo, Ariani Rassi, Amanda Pontes Cardoso, Paula Christina Figueira Nóbrega, Fernando Antônio Asevedo Pardo, Thiago Alexandre Salgueiro
author_role	author
author2	Di Felippo, Ariani Rassi, Amanda Pontes Cardoso, Paula Christina Figueira Nóbrega, Fernando Antônio Asevedo Pardo, Thiago Alexandre Salgueiro
author2_role	author author author author author
dc.contributor.author.fl_str_mv	Dias, Márcio de Souza Di Felippo, Ariani Rassi, Amanda Pontes Cardoso, Paula Christina Figueira Nóbrega, Fernando Antônio Asevedo Pardo, Thiago Alexandre Salgueiro
dc.subject.por.fl_str_mv	Automatic summarization Multi-document summary Linguistic problem Corpus annotation Sumarização automática Sumário multidocumento Problema linguístico Anotação de corpus
topic	Automatic summarization Multi-document summary Linguistic problem Corpus annotation Sumarização automática Sumário multidocumento Problema linguístico Anotação de corpus
description	Automatic summaries commonly present diverse linguistic problems that affect textual quality and thus their understanding by users. Few studies have tried to characterize such problems and their relation with the performance of the summarization systems. In this paper, we investigated the problems in multi-document extracts (i.e., summaries produced by concatenating several sentences taken exactly as they appear in the source texts) generated by systems for Brazilian Portuguese that have different approaches (i.e., superficial and deep) and performances (i.e., baseline and state-of-the art methods). For that, we first reviewed the main characterization studies, resulting in a typology of linguistic problems more suitable for multi-document summarization. Then, we manually annotated a corpus of automatic multi-document extracts in Portuguese based on the typology, which showed that some of linguistic problems are significantly more recurrent than others. Thus, this corpus annotation may support research on linguistic problems detection and correction for summary improvement, allowing the production of automatic summaries that are not only informative (i.e., they convey the content of the source material), but also linguistically well structured.
publishDate	2021
dc.date.none.fl_str_mv	2021 2022-06-27T12:44:35Z 2022-06-27T12:44:35Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	DIAS, M. de S. et al. An investigation of linguistic problems in automatic multi-document summaries. Revista de Estudos da Linguagem, Belo Horizonte, v. 29, n. 2, p. 859-907, 2021. DOI: 10.17851/2237-2083.29.2.859-907. http://repositorio.ufla.br/jspui/handle/1/50347
identifier_str_mv	DIAS, M. de S. et al. An investigation of linguistic problems in automatic multi-document summaries. Revista de Estudos da Linguagem, Belo Horizonte, v. 29, n. 2, p. 859-907, 2021. DOI: 10.17851/2237-2083.29.2.859-907.
url	http://repositorio.ufla.br/jspui/handle/1/50347
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais (UFMG), Faculdade de Letras (FALE)
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais (UFMG), Faculdade de Letras (FALE)
dc.source.none.fl_str_mv	Revista de Estudos da Linguagem reponame:Repositório Institucional da UFLA instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	Repositório Institucional da UFLA
collection	Repositório Institucional da UFLA
repository.name.fl_str_mv	Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	nivaldo@ufla.br \|\| repositorio.biblioteca@ufla.br
_version_	1815439299363995648

An investigation of linguistic problems in automatic multi-document summaries

Registros relacionados