Estudo da correlação entre propriedades estatísticas de verbetes

Detalhes bibliográficos
Autor(a) principal: FONCECA JUNIOR, José Ilberto
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Biblioteca Digital de Teses e Dissertações da UFRPE
Texto Completo: http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/7799
Resumo: The application of mathematical and statistical methods to exploit properties in natural languages has a recent and proli c history. These methods and the quantitative tecnhiques adapted and created through the study of languages are part of an area usually called quantitative linguistics. The rst work on such area was performed by George Zipf from 1930 to 1950 in which the distribution of word frequencies were studied. His works were followed by Claude Shannon's analysis on entropy and letters prediction as a measure of redundancy in written english. In this work, we firstly present a study on correlation and cross-correlation through the time series extracted from texts by using common approaches to investigate non-stationary time series. To perform the required analysis we have used a corpora as large as 250 literary texts from 10 diferent languages. The properties emerging from these correlations will also be discussed and properly explained. Secondly, we move to the description of the distance distribution responsible for the long-range structure observed on written language. We devise those distributions by assuming the distance distribution from consecutive prime numbers and distances taken from a Weibull distributed process. The revenues from such models will be put under scrutiny by using the techniques presented during the work and comparing them to properties emerging in natural language.
id URPE_162f42b523c8307c0dd6ed9a94ac4ac4
oai_identifier_str oai:tede2:tede2/7799
network_acronym_str URPE
network_name_str Biblioteca Digital de Teses e Dissertações da UFRPE
repository_id_str
spelling FIGUEIRÊDO, Pedro Hugo deFIGUEIRÊDO, Pedro Hugo deSOUZA, Adauto José Ferreira deGONZÁLEZ, Ramón Enrique Ramayohttp://lattes.cnpq.br/9496477807186101FONCECA JUNIOR, José Ilberto2018-12-21T14:12:00Z2017-04-19FONCECA JUNIOR, José Ilberto. Estudo da correlação entre propriedades estatísticas de verbetes. 2017. 118 f. Dissertação (Programa de Pós-Graduação em Física Aplicada) - Universidade Federal Rural de Pernambuco, Recife.http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/7799The application of mathematical and statistical methods to exploit properties in natural languages has a recent and proli c history. These methods and the quantitative tecnhiques adapted and created through the study of languages are part of an area usually called quantitative linguistics. The rst work on such area was performed by George Zipf from 1930 to 1950 in which the distribution of word frequencies were studied. His works were followed by Claude Shannon's analysis on entropy and letters prediction as a measure of redundancy in written english. In this work, we firstly present a study on correlation and cross-correlation through the time series extracted from texts by using common approaches to investigate non-stationary time series. To perform the required analysis we have used a corpora as large as 250 literary texts from 10 diferent languages. The properties emerging from these correlations will also be discussed and properly explained. Secondly, we move to the description of the distance distribution responsible for the long-range structure observed on written language. We devise those distributions by assuming the distance distribution from consecutive prime numbers and distances taken from a Weibull distributed process. The revenues from such models will be put under scrutiny by using the techniques presented during the work and comparing them to properties emerging in natural language.As investigações das línguas naturais através da aplicação de métodos matemáticos e estatísticos que buscam caracterizar propriedades de textos literários têm sido objeto de intensa investigação nas últimas décadas, constituindo uma área denominada de linguística quantitativa. Os primeiros trabalhos nessa área surgiram entre as décadas de 1930 e 1950, com os trabalhos de George Zipf no estudo da distribuição de frequências e Claude Shannon com seu trabalho em previsão de letras e palavras e entropia como medida de redundância em língua inglesa. Nesta dissertação serão investigadas a autocorrelação e correlações cruzadas das séries temporais utilizando técnicas comuns ao estudo de séries temporais não-estacionárias. Discutiremos também quais propriedades emergem dessas correlações e suas implicações no processo de escrita. Ao longo dessa análise, todos os resultados foram obtidos para um conjunto de 250 textos literários escritos em 10 línguas distintas. No momento fi nal desse trabalho, analisaremos as propriedades de textos genéricos obtidos através de dois modelos de distribuições de distância: uma que leva em consideração as distâncias entre os números primos consecutivos e outra que utiliza a distribuição de Weibull. Exploraremos as características que surgem em cada um dos modelos comparando-as com seus equivalentes nos textos em linguagem natural.Submitted by Mario BC (mario@bc.ufrpe.br) on 2018-12-21T14:12:00Z No. of bitstreams: 1 Jose Ilberto Fonseca Junior.pdf: 1959717 bytes, checksum: 54e06f9b34ec16b3e6c466b3ecc773ac (MD5)Made available in DSpace on 2018-12-21T14:12:00Z (GMT). No. of bitstreams: 1 Jose Ilberto Fonseca Junior.pdf: 1959717 bytes, checksum: 54e06f9b34ec16b3e6c466b3ecc773ac (MD5) Previous issue date: 2017-04-19Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESapplication/pdfporUniversidade Federal Rural de PernambucoPrograma de Pós-Graduação em Física AplicadaUFRPEBrasilDepartamento de FísicaLinguística quantitativaEntropiaMétodo estatísticoCIENCIAS EXATAS E DA TERRA::FISICAEstudo da correlação entre propriedades estatísticas de verbetesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis2948194971945047520600600600600-748177341945315287-83271462965037459292075167498588264571info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRPEinstname:Universidade Federal Rural de Pernambuco (UFRPE)instacron:UFRPEORIGINALJose Ilberto Fonseca Junior.pdfJose Ilberto Fonseca Junior.pdfapplication/pdf1959717http://www.tede2.ufrpe.br:8080/tede2/bitstream/tede2/7799/2/Jose+Ilberto+Fonseca+Junior.pdf54e06f9b34ec16b3e6c466b3ecc773acMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://www.tede2.ufrpe.br:8080/tede2/bitstream/tede2/7799/1/license.txtbd3efa91386c1718a7f26a329fdcb468MD51tede2/77992018-12-21 11:12:00.397oai:tede2:tede2/7799Tk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://www.tede2.ufrpe.br:8080/tede/PUBhttp://www.tede2.ufrpe.br:8080/oai/requestbdtd@ufrpe.br ||bdtd@ufrpe.bropendoar:2024-05-28T12:36:08.365680Biblioteca Digital de Teses e Dissertações da UFRPE - Universidade Federal Rural de Pernambuco (UFRPE)false
dc.title.por.fl_str_mv Estudo da correlação entre propriedades estatísticas de verbetes
title Estudo da correlação entre propriedades estatísticas de verbetes
spellingShingle Estudo da correlação entre propriedades estatísticas de verbetes
FONCECA JUNIOR, José Ilberto
Linguística quantitativa
Entropia
Método estatístico
CIENCIAS EXATAS E DA TERRA::FISICA
title_short Estudo da correlação entre propriedades estatísticas de verbetes
title_full Estudo da correlação entre propriedades estatísticas de verbetes
title_fullStr Estudo da correlação entre propriedades estatísticas de verbetes
title_full_unstemmed Estudo da correlação entre propriedades estatísticas de verbetes
title_sort Estudo da correlação entre propriedades estatísticas de verbetes
author FONCECA JUNIOR, José Ilberto
author_facet FONCECA JUNIOR, José Ilberto
author_role author
dc.contributor.advisor1.fl_str_mv FIGUEIRÊDO, Pedro Hugo de
dc.contributor.referee1.fl_str_mv FIGUEIRÊDO, Pedro Hugo de
dc.contributor.referee2.fl_str_mv SOUZA, Adauto José Ferreira de
dc.contributor.referee3.fl_str_mv GONZÁLEZ, Ramón Enrique Ramayo
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/9496477807186101
dc.contributor.author.fl_str_mv FONCECA JUNIOR, José Ilberto
contributor_str_mv FIGUEIRÊDO, Pedro Hugo de
FIGUEIRÊDO, Pedro Hugo de
SOUZA, Adauto José Ferreira de
GONZÁLEZ, Ramón Enrique Ramayo
dc.subject.por.fl_str_mv Linguística quantitativa
Entropia
Método estatístico
topic Linguística quantitativa
Entropia
Método estatístico
CIENCIAS EXATAS E DA TERRA::FISICA
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::FISICA
description The application of mathematical and statistical methods to exploit properties in natural languages has a recent and proli c history. These methods and the quantitative tecnhiques adapted and created through the study of languages are part of an area usually called quantitative linguistics. The rst work on such area was performed by George Zipf from 1930 to 1950 in which the distribution of word frequencies were studied. His works were followed by Claude Shannon's analysis on entropy and letters prediction as a measure of redundancy in written english. In this work, we firstly present a study on correlation and cross-correlation through the time series extracted from texts by using common approaches to investigate non-stationary time series. To perform the required analysis we have used a corpora as large as 250 literary texts from 10 diferent languages. The properties emerging from these correlations will also be discussed and properly explained. Secondly, we move to the description of the distance distribution responsible for the long-range structure observed on written language. We devise those distributions by assuming the distance distribution from consecutive prime numbers and distances taken from a Weibull distributed process. The revenues from such models will be put under scrutiny by using the techniques presented during the work and comparing them to properties emerging in natural language.
publishDate 2017
dc.date.issued.fl_str_mv 2017-04-19
dc.date.accessioned.fl_str_mv 2018-12-21T14:12:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv FONCECA JUNIOR, José Ilberto. Estudo da correlação entre propriedades estatísticas de verbetes. 2017. 118 f. Dissertação (Programa de Pós-Graduação em Física Aplicada) - Universidade Federal Rural de Pernambuco, Recife.
dc.identifier.uri.fl_str_mv http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/7799
identifier_str_mv FONCECA JUNIOR, José Ilberto. Estudo da correlação entre propriedades estatísticas de verbetes. 2017. 118 f. Dissertação (Programa de Pós-Graduação em Física Aplicada) - Universidade Federal Rural de Pernambuco, Recife.
url http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/7799
dc.language.iso.fl_str_mv por
language por
dc.relation.program.fl_str_mv 2948194971945047520
dc.relation.confidence.fl_str_mv 600
600
600
600
dc.relation.department.fl_str_mv -748177341945315287
dc.relation.cnpq.fl_str_mv -8327146296503745929
dc.relation.sponsorship.fl_str_mv 2075167498588264571
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal Rural de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Física Aplicada
dc.publisher.initials.fl_str_mv UFRPE
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Departamento de Física
publisher.none.fl_str_mv Universidade Federal Rural de Pernambuco
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da UFRPE
instname:Universidade Federal Rural de Pernambuco (UFRPE)
instacron:UFRPE
instname_str Universidade Federal Rural de Pernambuco (UFRPE)
instacron_str UFRPE
institution UFRPE
reponame_str Biblioteca Digital de Teses e Dissertações da UFRPE
collection Biblioteca Digital de Teses e Dissertações da UFRPE
bitstream.url.fl_str_mv http://www.tede2.ufrpe.br:8080/tede2/bitstream/tede2/7799/2/Jose+Ilberto+Fonseca+Junior.pdf
http://www.tede2.ufrpe.br:8080/tede2/bitstream/tede2/7799/1/license.txt
bitstream.checksum.fl_str_mv 54e06f9b34ec16b3e6c466b3ecc773ac
bd3efa91386c1718a7f26a329fdcb468
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da UFRPE - Universidade Federal Rural de Pernambuco (UFRPE)
repository.mail.fl_str_mv bdtd@ufrpe.br ||bdtd@ufrpe.br
_version_ 1810102255442984960