O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática

Tomazela, Élen Cátia

O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática

Detalhes bibliográficos
Autor(a) principal:	Tomazela, Élen Cátia
Data de Publicação:	2010
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/5700
Resumo:	This dissertation aims at presenting a theoretical heuristic model which not only takes into consideration the Veins Theory, but also semantic information obtained from the Parser PALAVRAS to improve the selection of correferential textual units to be included in automatic summaries. Based on the analysis of the problems presented by VeinSum, an automatic summarizer, two main issues have been raised: the necessity of improving its summaries salience and reducing their size so that they suit the compression rate more adequately. Better results can be achieved through the elimination of irrelevant textual units although the summaries referential clarity may not be damaged. Heuristics based on the semantic information have then been proposed. Despite the semantic annotation inconsistencies, all the noun phrases that compose the Summ-it Corpus have been post-edited manually, which increases the credibility of the heuristics. Eleven texts from the corpus have been analysed and the results obtained are satisfactory, although a wider study would be required to better evaluate the results of this proposal.

Metadados do item

id	SCAR_35740623e1f0323206808bc5f0122b1c
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/5700
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Tomazela, Élen CátiaRino, Lúcia Helena Machadohttp://lattes.cnpq.br/0315640846525832http://lattes.cnpq.br/52608372970004385dc1808e-f635-4118-826f-767e55e2aa1d2016-06-02T20:25:07Z2011-02-112016-06-02T20:25:07Z2010-06-21TOMAZELA, Élen Cátia. O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na Sumarização Automática. 2010. 149 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.https://repositorio.ufscar.br/handle/ufscar/5700This dissertation aims at presenting a theoretical heuristic model which not only takes into consideration the Veins Theory, but also semantic information obtained from the Parser PALAVRAS to improve the selection of correferential textual units to be included in automatic summaries. Based on the analysis of the problems presented by VeinSum, an automatic summarizer, two main issues have been raised: the necessity of improving its summaries salience and reducing their size so that they suit the compression rate more adequately. Better results can be achieved through the elimination of irrelevant textual units although the summaries referential clarity may not be damaged. Heuristics based on the semantic information have then been proposed. Despite the semantic annotation inconsistencies, all the noun phrases that compose the Summ-it Corpus have been post-edited manually, which increases the credibility of the heuristics. Eleven texts from the corpus have been analysed and the results obtained are satisfactory, although a wider study would be required to better evaluate the results of this proposal.Esta dissertação tem como foco a proposta de um modelo heurístico teórico que utiliza, além da Teoria das Veias, informações semânticas provenientes do Parser PALAVRAS para melhorar a seleção de unidades correferentes para a inclusão em sumários automáticos. A partir da análise dos problemas apresentados pelo sumarizador automático VeinSum, identificou-se a necessidade de melhorar a saliência dos sumários produzidos, além de reduzir o tamanho dos mesmos para que se aproximassem mais da taxa de compressão ideal. Propõese, então, a eliminação de unidades textuais de importância secundária no que tange à clareza referencial, sem danificá-la, no entanto. Para isso, heurísticas baseadas nas informações semânticas do PALAVRAS foram propostas. Apesar de o parser apresentar inconsistências de etiquetação semântica, a anotação de todos os sintagmas nominais dos 50 textos-fonte que compõem o corpus Summ-it foi pós-editada manualmente para melhorar a confiabilidade das heurísticas geradas. Foram analisados 11 textos pertencentes ao corpus e os resultados são satisfatórios, porém reconhece-se que, para melhor avaliar os resultados desta proposta, faz-se necessário um estudo mais amplo.Universidade Federal de Minas Geraisapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarBRLinguística - processamento de dadosSumarização automáticaTextualidadeCorreferênciaLINGUISTICA, LETRAS E ARTES::LINGUISTICAO uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automáticainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-1629307e8-d9f0-4e50-b2e4-e495b4d8b0fbinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL3413.pdfapplication/pdf1158214https://repositorio.ufscar.br/bitstream/ufscar/5700/1/3413.pdf96b742071a87c5d34f6d705e6fa72237MD51THUMBNAIL3413.pdf.jpg3413.pdf.jpgIM Thumbnailimage/jpeg7836https://repositorio.ufscar.br/bitstream/ufscar/5700/2/3413.pdf.jpgdcaa6de93f89b66858fae204259078dcMD52ufscar/57002023-09-18 18:31:18.245oai:repositorio.ufscar.br:ufscar/5700Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:18Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
title	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
spellingShingle	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática Tomazela, Élen Cátia Linguística - processamento de dados Sumarização automática Textualidade Correferência LINGUISTICA, LETRAS E ARTES::LINGUISTICA
title_short	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
title_full	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
title_fullStr	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
title_full_unstemmed	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
title_sort	O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática
author	Tomazela, Élen Cátia
author_facet	Tomazela, Élen Cátia
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/5260837297000438
dc.contributor.author.fl_str_mv	Tomazela, Élen Cátia
dc.contributor.advisor1.fl_str_mv	Rino, Lúcia Helena Machado
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/0315640846525832
dc.contributor.authorID.fl_str_mv	5dc1808e-f635-4118-826f-767e55e2aa1d
contributor_str_mv	Rino, Lúcia Helena Machado
dc.subject.por.fl_str_mv	Linguística - processamento de dados Sumarização automática Textualidade Correferência
topic	Linguística - processamento de dados Sumarização automática Textualidade Correferência LINGUISTICA, LETRAS E ARTES::LINGUISTICA
dc.subject.cnpq.fl_str_mv	LINGUISTICA, LETRAS E ARTES::LINGUISTICA
description	This dissertation aims at presenting a theoretical heuristic model which not only takes into consideration the Veins Theory, but also semantic information obtained from the Parser PALAVRAS to improve the selection of correferential textual units to be included in automatic summaries. Based on the analysis of the problems presented by VeinSum, an automatic summarizer, two main issues have been raised: the necessity of improving its summaries salience and reducing their size so that they suit the compression rate more adequately. Better results can be achieved through the elimination of irrelevant textual units although the summaries referential clarity may not be damaged. Heuristics based on the semantic information have then been proposed. Despite the semantic annotation inconsistencies, all the noun phrases that compose the Summ-it Corpus have been post-edited manually, which increases the credibility of the heuristics. Eleven texts from the corpus have been analysed and the results obtained are satisfactory, although a wider study would be required to better evaluate the results of this proposal.
publishDate	2010
dc.date.issued.fl_str_mv	2010-06-21
dc.date.available.fl_str_mv	2011-02-11 2016-06-02T20:25:07Z
dc.date.accessioned.fl_str_mv	2016-06-02T20:25:07Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	TOMAZELA, Élen Cátia. O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na Sumarização Automática. 2010. 149 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/5700
identifier_str_mv	TOMAZELA, Élen Cátia. O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na Sumarização Automática. 2010. 149 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.
url	https://repositorio.ufscar.br/handle/ufscar/5700
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	-1 -1
dc.relation.authority.fl_str_mv	629307e8-d9f0-4e50-b2e4-e495b4d8b0fb
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Linguística - PPGL
dc.publisher.initials.fl_str_mv	UFSCar
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/5700/1/3413.pdf https://repositorio.ufscar.br/bitstream/ufscar/5700/2/3413.pdf.jpg
bitstream.checksum.fl_str_mv	96b742071a87c5d34f6d705e6fa72237 dcaa6de93f89b66858fae204259078dc
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1813715545763086336

O uso de informações semânticas do PALAVRAS : em busca do aprimoramento da seleção de unidades textuais correferentes na sumarização automática

Registros relacionados