Formação de gentílicos a partir de topônimos : proposta de geração automática

Detalhes bibliográficos
Autor(a) principal: Antunes, Roger Alfredo de Marci Rodrigues
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFSCAR
Texto Completo: https://repositorio.ufscar.br/handle/ufscar/9035
Resumo: It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms.
id SCAR_5207d8815f9210a5ff8babf051fbab8c
oai_identifier_str oai:repositorio.ufscar.br:ufscar/9035
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Antunes, Roger Alfredo de Marci RodriguesAlmeida, Gladis Maria de Barcelloshttp://lattes.cnpq.br/4046789388750478http://lattes.cnpq.br/7929075503955558d67f270b-2aa9-457e-b131-58fbaa0e4d2c2017-08-21T18:50:41Z2017-08-21T18:50:41Z2017-02-17ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035.https://repositorio.ufscar.br/handle/ufscar/9035It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms.Utilizam-se diariamente nomes de cidades e adjetivos que indicam as pessoas que nasceram ou vivem nessas cidades, mas raramente se reflete sobre as regras de formação dessas palavras. O presente trabalho tem como objetivo descrever os adjetivos pátrios, ou gentílicos, que advêm dos nomes dos lugares - topônimos -, por meio de regras de combinação morfológicas específicas e propor a representação formal das suas regularidades com intuito de servir de base para um sistema computacional capaz de gerar automaticamente os gentílicos a partir dos seus topônimos. Tomou-se como orientação os princípios metodológicos de Dias-da-Silva (1996) - no que concerne à metodologia trifásica do PLN -, e os pressupostos teóricos nos trabalhos de Borba (1998), Biderman (2001), Dick (2007), Jurafsky (2009) e Sandmann (1992, 1997). O corpus da pesquisa consiste na lista dos topônimos de 5.570 municípios e seus respectivos gentílicos, extraídos do banco de dados do Instituto Brasileiro de Geografia e Estatística (IBGE). Com esta pesquisa, foi possível observar que somente a partir das menores unidades recorrentes, como os sufixos e as extremidades finais das unidades léxicas, podem-se extrair padrões para a formulação de regras de combinação para um processamento automático. Além disso, a problemática da representação computacional evidencia a complexidade das línguas naturais, que embora sejam passíveis de processamento automático, são opacas e, desta maneira, sempre haverá questões inerentes a elas que dificultam essa tarefa. Ainda assim, os resultados mostraram que é possível automatizar a geração de gentílicos a partir de topônimos em 52% do total, o que já é um número razoável, considerando a opacidade inerente à língua natural mencionada.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarGentílicoToponímiaMorfologia lexicalProcessos de formação de palavrasLinguística computacionalProcessamento de línguas naturaisGentileToponymyLexical morphologyComputational linguisticsNatural language processingLINGUISTICA, LETRAS E ARTES::LINGUISTICAFormação de gentílicos a partir de topônimos : proposta de geração automáticainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006008eac6ac4-a936-48dd-b9d7-997dc0548cbcinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissRARMA.pdfDissRARMA.pdfapplication/pdf5470332https://repositorio.ufscar.br/bitstream/ufscar/9035/1/DissRARMA.pdfe56022e54a0fe99cc8ca45fc74f7e424MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/9035/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissRARMA.pdf.txtDissRARMA.pdf.txtExtracted texttext/plain413817https://repositorio.ufscar.br/bitstream/ufscar/9035/3/DissRARMA.pdf.txt206f6d8b331fe65cba523a29263b5e84MD53THUMBNAILDissRARMA.pdf.jpgDissRARMA.pdf.jpgIM Thumbnailimage/jpeg10043https://repositorio.ufscar.br/bitstream/ufscar/9035/4/DissRARMA.pdf.jpg8306f6d74fa73a5044ac3cc6887f001bMD54ufscar/90352023-09-18 18:31:25.792oai:repositorio.ufscar.br:ufscar/9035TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:25Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv Formação de gentílicos a partir de topônimos : proposta de geração automática
title Formação de gentílicos a partir de topônimos : proposta de geração automática
spellingShingle Formação de gentílicos a partir de topônimos : proposta de geração automática
Antunes, Roger Alfredo de Marci Rodrigues
Gentílico
Toponímia
Morfologia lexical
Processos de formação de palavras
Linguística computacional
Processamento de línguas naturais
Gentile
Toponymy
Lexical morphology
Computational linguistics
Natural language processing
LINGUISTICA, LETRAS E ARTES::LINGUISTICA
title_short Formação de gentílicos a partir de topônimos : proposta de geração automática
title_full Formação de gentílicos a partir de topônimos : proposta de geração automática
title_fullStr Formação de gentílicos a partir de topônimos : proposta de geração automática
title_full_unstemmed Formação de gentílicos a partir de topônimos : proposta de geração automática
title_sort Formação de gentílicos a partir de topônimos : proposta de geração automática
author Antunes, Roger Alfredo de Marci Rodrigues
author_facet Antunes, Roger Alfredo de Marci Rodrigues
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/7929075503955558
dc.contributor.author.fl_str_mv Antunes, Roger Alfredo de Marci Rodrigues
dc.contributor.advisor1.fl_str_mv Almeida, Gladis Maria de Barcellos
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/4046789388750478
dc.contributor.authorID.fl_str_mv d67f270b-2aa9-457e-b131-58fbaa0e4d2c
contributor_str_mv Almeida, Gladis Maria de Barcellos
dc.subject.por.fl_str_mv Gentílico
Toponímia
Morfologia lexical
Processos de formação de palavras
Linguística computacional
Processamento de línguas naturais
topic Gentílico
Toponímia
Morfologia lexical
Processos de formação de palavras
Linguística computacional
Processamento de línguas naturais
Gentile
Toponymy
Lexical morphology
Computational linguistics
Natural language processing
LINGUISTICA, LETRAS E ARTES::LINGUISTICA
dc.subject.eng.fl_str_mv Gentile
Toponymy
Lexical morphology
Computational linguistics
Natural language processing
dc.subject.cnpq.fl_str_mv LINGUISTICA, LETRAS E ARTES::LINGUISTICA
description It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms.
publishDate 2017
dc.date.accessioned.fl_str_mv 2017-08-21T18:50:41Z
dc.date.available.fl_str_mv 2017-08-21T18:50:41Z
dc.date.issued.fl_str_mv 2017-02-17
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/ufscar/9035
identifier_str_mv ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035.
url https://repositorio.ufscar.br/handle/ufscar/9035
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 8eac6ac4-a936-48dd-b9d7-997dc0548cbc
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Linguística - PPGL
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstream/ufscar/9035/1/DissRARMA.pdf
https://repositorio.ufscar.br/bitstream/ufscar/9035/2/license.txt
https://repositorio.ufscar.br/bitstream/ufscar/9035/3/DissRARMA.pdf.txt
https://repositorio.ufscar.br/bitstream/ufscar/9035/4/DissRARMA.pdf.jpg
bitstream.checksum.fl_str_mv e56022e54a0fe99cc8ca45fc74f7e424
ae0398b6f8b235e40ad82cba6c50031d
206f6d8b331fe65cba523a29263b5e84
8306f6d74fa73a5044ac3cc6887f001b
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_ 1813715578560446464