Formação de gentílicos a partir de topônimos : proposta de geração automática
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/9035 |
Resumo: | It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms. |
id |
SCAR_5207d8815f9210a5ff8babf051fbab8c |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/9035 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Antunes, Roger Alfredo de Marci RodriguesAlmeida, Gladis Maria de Barcelloshttp://lattes.cnpq.br/4046789388750478http://lattes.cnpq.br/7929075503955558d67f270b-2aa9-457e-b131-58fbaa0e4d2c2017-08-21T18:50:41Z2017-08-21T18:50:41Z2017-02-17ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035.https://repositorio.ufscar.br/handle/ufscar/9035It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms.Utilizam-se diariamente nomes de cidades e adjetivos que indicam as pessoas que nasceram ou vivem nessas cidades, mas raramente se reflete sobre as regras de formação dessas palavras. O presente trabalho tem como objetivo descrever os adjetivos pátrios, ou gentílicos, que advêm dos nomes dos lugares - topônimos -, por meio de regras de combinação morfológicas específicas e propor a representação formal das suas regularidades com intuito de servir de base para um sistema computacional capaz de gerar automaticamente os gentílicos a partir dos seus topônimos. Tomou-se como orientação os princípios metodológicos de Dias-da-Silva (1996) - no que concerne à metodologia trifásica do PLN -, e os pressupostos teóricos nos trabalhos de Borba (1998), Biderman (2001), Dick (2007), Jurafsky (2009) e Sandmann (1992, 1997). O corpus da pesquisa consiste na lista dos topônimos de 5.570 municípios e seus respectivos gentílicos, extraídos do banco de dados do Instituto Brasileiro de Geografia e Estatística (IBGE). Com esta pesquisa, foi possível observar que somente a partir das menores unidades recorrentes, como os sufixos e as extremidades finais das unidades léxicas, podem-se extrair padrões para a formulação de regras de combinação para um processamento automático. Além disso, a problemática da representação computacional evidencia a complexidade das línguas naturais, que embora sejam passíveis de processamento automático, são opacas e, desta maneira, sempre haverá questões inerentes a elas que dificultam essa tarefa. Ainda assim, os resultados mostraram que é possível automatizar a geração de gentílicos a partir de topônimos em 52% do total, o que já é um número razoável, considerando a opacidade inerente à língua natural mencionada.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarGentílicoToponímiaMorfologia lexicalProcessos de formação de palavrasLinguística computacionalProcessamento de línguas naturaisGentileToponymyLexical morphologyComputational linguisticsNatural language processingLINGUISTICA, LETRAS E ARTES::LINGUISTICAFormação de gentílicos a partir de topônimos : proposta de geração automáticainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006008eac6ac4-a936-48dd-b9d7-997dc0548cbcinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissRARMA.pdfDissRARMA.pdfapplication/pdf5470332https://repositorio.ufscar.br/bitstream/ufscar/9035/1/DissRARMA.pdfe56022e54a0fe99cc8ca45fc74f7e424MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/9035/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissRARMA.pdf.txtDissRARMA.pdf.txtExtracted texttext/plain413817https://repositorio.ufscar.br/bitstream/ufscar/9035/3/DissRARMA.pdf.txt206f6d8b331fe65cba523a29263b5e84MD53THUMBNAILDissRARMA.pdf.jpgDissRARMA.pdf.jpgIM Thumbnailimage/jpeg10043https://repositorio.ufscar.br/bitstream/ufscar/9035/4/DissRARMA.pdf.jpg8306f6d74fa73a5044ac3cc6887f001bMD54ufscar/90352023-09-18 18:31:25.792oai:repositorio.ufscar.br:ufscar/9035TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:25Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
title |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
spellingShingle |
Formação de gentílicos a partir de topônimos : proposta de geração automática Antunes, Roger Alfredo de Marci Rodrigues Gentílico Toponímia Morfologia lexical Processos de formação de palavras Linguística computacional Processamento de línguas naturais Gentile Toponymy Lexical morphology Computational linguistics Natural language processing LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
title_short |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
title_full |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
title_fullStr |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
title_full_unstemmed |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
title_sort |
Formação de gentílicos a partir de topônimos : proposta de geração automática |
author |
Antunes, Roger Alfredo de Marci Rodrigues |
author_facet |
Antunes, Roger Alfredo de Marci Rodrigues |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/7929075503955558 |
dc.contributor.author.fl_str_mv |
Antunes, Roger Alfredo de Marci Rodrigues |
dc.contributor.advisor1.fl_str_mv |
Almeida, Gladis Maria de Barcellos |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/4046789388750478 |
dc.contributor.authorID.fl_str_mv |
d67f270b-2aa9-457e-b131-58fbaa0e4d2c |
contributor_str_mv |
Almeida, Gladis Maria de Barcellos |
dc.subject.por.fl_str_mv |
Gentílico Toponímia Morfologia lexical Processos de formação de palavras Linguística computacional Processamento de línguas naturais |
topic |
Gentílico Toponímia Morfologia lexical Processos de formação de palavras Linguística computacional Processamento de línguas naturais Gentile Toponymy Lexical morphology Computational linguistics Natural language processing LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
dc.subject.eng.fl_str_mv |
Gentile Toponymy Lexical morphology Computational linguistics Natural language processing |
dc.subject.cnpq.fl_str_mv |
LINGUISTICA, LETRAS E ARTES::LINGUISTICA |
description |
It is a common habit to use the adjective of the city name to indicate people’s origin, however the formulating rules of the adjective has been rarely discussed in the literature. The main objective of this work is to describe the gentile adjectives, which originate from the place names called toponyms. Using specific morphological rules of combination and proposing the formal representation of their regularities we can formulate the basis for a computational system, which can automatically generate the gentiles from their place names. The system proposed here is founded on the methodological principles of Dias-da-Silva (1996) - with respect to the three-phase methodology of the Natural language processing (NLP) - and the theoretical assumptions in the works of Borba (1998), Biderman (2001), Dick (2007) Jurafsky (2009) and Sandmann (1992, 1997). The corpus consists of 5,570 municipalities’ names (toponyms) and their respective gentiles, extracted in a form of a list from the database of the Instituto Brasileiro de Geografia e Estatística (IBGE). It was observed that only from a small set of recurrent unities, such as suffixes and ends of lexical entities, it is possible to extract patterns which can be subsequently used to formulate combination rules for automatic word processing. During this work, the issue of computational representation stands out and proves natural language complexity. Although natural languages can be in principle automatically processed using computers, their inherent features may deviate from the formulated rules and make the processing more intricate. Nonetheless, the results show that it is possible to automatize 52% of the generation of gentiles from the municipal toponyms. Conclusively the inherent opacity of the Portuguese does not allow direct processing of all of the language toponyms. |
publishDate |
2017 |
dc.date.accessioned.fl_str_mv |
2017-08-21T18:50:41Z |
dc.date.available.fl_str_mv |
2017-08-21T18:50:41Z |
dc.date.issued.fl_str_mv |
2017-02-17 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/9035 |
identifier_str_mv |
ANTUNES, Roger Alfredo de Marci Rodrigues. Formação de gentílicos a partir de topônimos : proposta de geração automática. 2017. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9035. |
url |
https://repositorio.ufscar.br/handle/ufscar/9035 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
600 600 |
dc.relation.authority.fl_str_mv |
8eac6ac4-a936-48dd-b9d7-997dc0548cbc |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Linguística - PPGL |
dc.publisher.initials.fl_str_mv |
UFSCar |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/9035/1/DissRARMA.pdf https://repositorio.ufscar.br/bitstream/ufscar/9035/2/license.txt https://repositorio.ufscar.br/bitstream/ufscar/9035/3/DissRARMA.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/9035/4/DissRARMA.pdf.jpg |
bitstream.checksum.fl_str_mv |
e56022e54a0fe99cc8ca45fc74f7e424 ae0398b6f8b235e40ad82cba6c50031d 206f6d8b331fe65cba523a29263b5e84 8306f6d74fa73a5044ac3cc6887f001b |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1813715578560446464 |