Descrição linguística da complementaridade para a sumarização automática multidocumento
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/8311 |
Resumo: | Automatic Multidocument Summarizarion (AMS) is a computational alternative to process the large quantity of information available online. In AMS, we try to automatically generate a single coherent and cohesive summary from a set of documents which have same subject, each these documents are originate from different sources. Furthermore, some methods of AMS select the most important information from the collection to compose the summary. The selection of main content sometimes requires the identification of redundancy, complementarity and contradiction, characterized by being the multidocument phenomena. The identification of complementarity, in particular, is relevant inasmuch as some information may be selected to the summary as a complement of another information that was already selected, ensuring more coherence and most informative. Some AMS methods to condense the content of the documents based on the identification of relations from the Cross-document Structure Theory (CST), which is established between sentences of different documents. These relationships (for example Historical background) capture the phenomenon of complementarity. Automatic detection of these relationships is often made based on lexical similarity between a pair of sentences, since research on AMS not count on studies that have characterized the phenomenon and show other relevant linguistic strategies to automatically detect the complementarity. In this work, we present the linguistic description of complementarity based on corpus. In addition, we elaborate the characteristics of this phenomenon in attributes that support the automatic identification. As a result, we obtained sets of rules that demonstrate the most relevant attributes for complementary CST relations (Historical background, Follow-up and Elaboration) and its types (temporal and timeless) complementarity. According this, we hope to contribute to the Descriptive Linguistics, with survey-based corpus of linguistic characteristics of this phenomenon, as of Automatic Processing of Natural Languages, by means of rules that can support the automatic identification of CST relations and types complementarity. |
id |
SCAR_7f4aa3a2ef6c31edda1f03c32d15c849 |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/8311 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Souza, Jackson Wilke da CruzDi Felippo, Arianihttp://lattes.cnpq.br/8648412103197455http://lattes.cnpq.br/0019187301069627f06aa711-84de-43a8-a5ff-f75bcf3bac0d2016-11-08T19:05:06Z2016-11-08T19:05:06Z2015-11-11SOUZA, Jackson Wilke da Cruz. Descrição linguística da complementaridade para a sumarização automática multidocumento. 2015. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2015. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8311.https://repositorio.ufscar.br/handle/ufscar/8311Automatic Multidocument Summarizarion (AMS) is a computational alternative to process the large quantity of information available online. In AMS, we try to automatically generate a single coherent and cohesive summary from a set of documents which have same subject, each these documents are originate from different sources. Furthermore, some methods of AMS select the most important information from the collection to compose the summary. The selection of main content sometimes requires the identification of redundancy, complementarity and contradiction, characterized by being the multidocument phenomena. The identification of complementarity, in particular, is relevant inasmuch as some information may be selected to the summary as a complement of another information that was already selected, ensuring more coherence and most informative. Some AMS methods to condense the content of the documents based on the identification of relations from the Cross-document Structure Theory (CST), which is established between sentences of different documents. These relationships (for example Historical background) capture the phenomenon of complementarity. Automatic detection of these relationships is often made based on lexical similarity between a pair of sentences, since research on AMS not count on studies that have characterized the phenomenon and show other relevant linguistic strategies to automatically detect the complementarity. In this work, we present the linguistic description of complementarity based on corpus. In addition, we elaborate the characteristics of this phenomenon in attributes that support the automatic identification. As a result, we obtained sets of rules that demonstrate the most relevant attributes for complementary CST relations (Historical background, Follow-up and Elaboration) and its types (temporal and timeless) complementarity. According this, we hope to contribute to the Descriptive Linguistics, with survey-based corpus of linguistic characteristics of this phenomenon, as of Automatic Processing of Natural Languages, by means of rules that can support the automatic identification of CST relations and types complementarity.A Sumarização Automática Multidocumento (SAM) é uma alternativa computacional para o tratamento da grande quantidade de informação disponível on-line. Nela, busca-se gerar automaticamente um único sumário coerente e coeso a partir de uma coleção de textos que tratam de um mesmo assunto, sendo cada um deles proveniente de fontes distintas. Para tanto, a SAM seleciona informações mais importantes da coleção para compor o sumário. A seleção do conteúdo principal requer, por vezes, a identificação da redundância, complementaridade e contradição, que se caracterizam por serem os fenômenos multidocumento. A identificação da complementaridade, em especial, é relevante porque uma informação pode ser selecionada para o sumário uma vez que complementa outra já selecionada, garantindo mais coerência e informatividade. Alguns métodos de SAM realizam a condensação do conteúdo dos textos-fonte com base na identificação das relações do modelo/teoria Cross Document Structure Theory (CST) que se estabelecem entre as sentenças dos diferentes textos-fonte. Algumas dessas relações (p.ex., Historical background) capturam o fenômeno da complementaridade. A detecção automática dessas relações é comumente feita com base na similaridade lexical entre as sentenças, posto que as pesquisas sobre SAM não contam com estudos que tenham caracterizado o fenômeno, evidenciado outras estratégias linguísticas relevantes para detectar automaticamente a complementaridade. Neste trabalho, fez-se a descrição linguística da complementaridade com base em corpus, traduzindo as características desse fenômeno em atributos que subsidiam a sua identificação automática. Como resultados, obtiveram-se conjuntos de regras que evidenciam os atributos mais relevantes para a discriminação das relações CST de complementaridade (Historical background, Follow-up e Elaboration) e dos tipos (temporal e atemporal) da complementaridade. Com isso, espera-se contribuir para a Linguística Descritiva, com o levantamento baseados em corpus das características linguísticas do referido fenômeno, quanto para o Processamento Automático de Línguas Naturais, por meio das regras que podem subsidiar a identificação automática das relações CST e dos tipos de complementaridade.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarComplementaridadeRelações CSTLinguística textualDescrição linguísticaSumarização automática multidocumentoLINGUISTICA, LETRAS E ARTES::LINGUISTICADescrição linguística da complementaridade para a sumarização automática multidocumentoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline60060026c5db60-6612-41e6-a8f9-f94fb475ca58info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissJWCS.pdfDissJWCS.pdfapplication/pdf1378387https://repositorio.ufscar.br/bitstream/ufscar/8311/1/DissJWCS.pdf8f4432b0959dda94e372b6cbb7dd8e7eMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/8311/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissJWCS.pdf.txtDissJWCS.pdf.txtExtracted texttext/plain230573https://repositorio.ufscar.br/bitstream/ufscar/8311/3/DissJWCS.pdf.txt6c3ca115af46f4bf9da1d2f3652a0ef8MD53THUMBNAILDissJWCS.pdf.jpgDissJWCS.pdf.jpgIM Thumbnailimage/jpeg10691https://repositorio.ufscar.br/bitstream/ufscar/8311/4/DissJWCS.pdf.jpg6261b1cae8491149a6c26ce4fa4747f1MD54ufscar/83112023-09-18 18:31:04.446oai:repositorio.ufscar.br:ufscar/8311TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:04Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
title |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
spellingShingle |
Descrição linguística da complementaridade para a sumarização automática multidocumento Souza, Jackson Wilke da Cruz Complementaridade Relações CST Linguística textual Descrição linguística Sumarização automática multidocumento LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
title_short |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
title_full |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
title_fullStr |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
title_full_unstemmed |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
title_sort |
Descrição linguística da complementaridade para a sumarização automática multidocumento |
author |
Souza, Jackson Wilke da Cruz |
author_facet |
Souza, Jackson Wilke da Cruz |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/0019187301069627 |
dc.contributor.author.fl_str_mv |
Souza, Jackson Wilke da Cruz |
dc.contributor.advisor1.fl_str_mv |
Di Felippo, Ariani |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/8648412103197455 |
dc.contributor.authorID.fl_str_mv |
f06aa711-84de-43a8-a5ff-f75bcf3bac0d |
contributor_str_mv |
Di Felippo, Ariani |
dc.subject.por.fl_str_mv |
Complementaridade Relações CST Linguística textual Descrição linguística Sumarização automática multidocumento |
topic |
Complementaridade Relações CST Linguística textual Descrição linguística Sumarização automática multidocumento LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
dc.subject.cnpq.fl_str_mv |
LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
description |
Automatic Multidocument Summarizarion (AMS) is a computational alternative to process the large quantity of information available online. In AMS, we try to automatically generate a single coherent and cohesive summary from a set of documents which have same subject, each these documents are originate from different sources. Furthermore, some methods of AMS select the most important information from the collection to compose the summary. The selection of main content sometimes requires the identification of redundancy, complementarity and contradiction, characterized by being the multidocument phenomena. The identification of complementarity, in particular, is relevant inasmuch as some information may be selected to the summary as a complement of another information that was already selected, ensuring more coherence and most informative. Some AMS methods to condense the content of the documents based on the identification of relations from the Cross-document Structure Theory (CST), which is established between sentences of different documents. These relationships (for example Historical background) capture the phenomenon of complementarity. Automatic detection of these relationships is often made based on lexical similarity between a pair of sentences, since research on AMS not count on studies that have characterized the phenomenon and show other relevant linguistic strategies to automatically detect the complementarity. In this work, we present the linguistic description of complementarity based on corpus. In addition, we elaborate the characteristics of this phenomenon in attributes that support the automatic identification. As a result, we obtained sets of rules that demonstrate the most relevant attributes for complementary CST relations (Historical background, Follow-up and Elaboration) and its types (temporal and timeless) complementarity. According this, we hope to contribute to the Descriptive Linguistics, with survey-based corpus of linguistic characteristics of this phenomenon, as of Automatic Processing of Natural Languages, by means of rules that can support the automatic identification of CST relations and types complementarity. |
publishDate |
2015 |
dc.date.issued.fl_str_mv |
2015-11-11 |
dc.date.accessioned.fl_str_mv |
2016-11-08T19:05:06Z |
dc.date.available.fl_str_mv |
2016-11-08T19:05:06Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SOUZA, Jackson Wilke da Cruz. Descrição linguística da complementaridade para a sumarização automática multidocumento. 2015. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2015. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8311. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/8311 |
identifier_str_mv |
SOUZA, Jackson Wilke da Cruz. Descrição linguística da complementaridade para a sumarização automática multidocumento. 2015. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2015. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8311. |
url |
https://repositorio.ufscar.br/handle/ufscar/8311 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
600 600 |
dc.relation.authority.fl_str_mv |
26c5db60-6612-41e6-a8f9-f94fb475ca58 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Linguística - PPGL |
dc.publisher.initials.fl_str_mv |
UFSCar |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/8311/1/DissJWCS.pdf https://repositorio.ufscar.br/bitstream/ufscar/8311/2/license.txt https://repositorio.ufscar.br/bitstream/ufscar/8311/3/DissJWCS.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/8311/4/DissJWCS.pdf.jpg |
bitstream.checksum.fl_str_mv |
8f4432b0959dda94e372b6cbb7dd8e7e ae0398b6f8b235e40ad82cba6c50031d 6c3ca115af46f4bf9da1d2f3652a0ef8 6261b1cae8491149a6c26ce4fa4747f1 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1813715570137235456 |