The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Domínios de Lingu@gem |
Texto Completo: | https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444 |
Resumo: | This paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support. |
id |
UFU-12_e4468729a0df1d331917c7c0a95a300d |
---|---|
oai_identifier_str |
oai:ojs.www.seer.ufu.br:article/63444 |
network_acronym_str |
UFU-12 |
network_name_str |
Domínios de Lingu@gem |
repository_id_str |
|
spelling |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)A construção de um banco de dados lexicográfico em XML a partir de dados dialetais: o Processamento Automático de Linguagem Natural (PLN)Lexicografia DialetalLinguística ComputacionalBanco de dados em XMLPLNDialectal LexicographyComputational LinguisticsXML databaseNLPThis paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support.Este artigo situa-se na interface entre a Lexicografia (PORTO DAPENA, 2002; HARTMANN, 2016), a Dialetologia (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) e a Linguística Computacional (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). Objetiva-se discutir a proposta de construção de um banco de dados em XML (Extensible Markup Language), explorando os resultados obtidos com o PLN (Processamento Automático de Linguagem Natural). O arquivo XML também se fundamenta em parâmetros da Lexicografia Dialetal (EZQUERRA, 1997; NAVARRO CARRASCO, 1993) e está sendo alimentado com dados dialetais oriundos do Projeto Atlas Linguístico do Brasil (ALiB) documentados na região Norte do país. Para tanto, utilizou-se como editor de texto o software jEdit e, para gerenciar o banco de dados, o programa BaseX. A extração das informações linguísticas foi realizada, no BaseX, a partir de uma amostra de dados e com o auxílio de expressões X-Query. Assim, foram executadas as seguintes manipulações de dados: i) localização de uma unidade lexical específica; ii) visualização de qualquer dado da microestrutura filtrada pelas variáveis sexo, idade, escolaridade e localidade; iii) seleção de informações a partir de uma das 14 áreas semânticas em que as questões do questionário semântico-lexical do ALiB foram organizadas. Em síntese, entende-se que a construção do banco de dados em XML confere agilidade em relação à extração de informações e compatibilidade dos dados para executar interfaces com outras aplicações como, por exemplo, a elaboração de um produto lexicográfico a ser publicado em suporte on-line.PP/UFU2022-09-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/xmlhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/6344410.14393/DL52-v16n4a2022-11Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1544-1570Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1544-1570Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1544-15701980-5799reponame:Domínios de Lingu@geminstname:Universidade Federal de Uberlândia (UFU)instacron:UFUporhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/33948https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/35236Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdohttp://creativecommons.org/licenses/by-nc-nd/4.0info:eu-repo/semantics/openAccess Santos Junior, Jorge Luiz Nunes dosIsquerdo, Aparecida Negri2022-12-09T18:25:24Zoai:ojs.www.seer.ufu.br:article/63444Revistahttps://seer.ufu.br/index.php/dominiosdelinguagemPUBhttps://seer.ufu.br/index.php/dominiosdelinguagem/oairevistadominios@ileel.ufu.br||1980-57991980-5799opendoar:2022-12-09T18:25:24Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU)false |
dc.title.none.fl_str_mv |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) A construção de um banco de dados lexicográfico em XML a partir de dados dialetais: o Processamento Automático de Linguagem Natural (PLN) |
title |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
spellingShingle |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) Santos Junior, Jorge Luiz Nunes dos Lexicografia Dialetal Linguística Computacional Banco de dados em XML PLN Dialectal Lexicography Computational Linguistics XML database NLP |
title_short |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
title_full |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
title_fullStr |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
title_full_unstemmed |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
title_sort |
The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP) |
author |
Santos Junior, Jorge Luiz Nunes dos |
author_facet |
Santos Junior, Jorge Luiz Nunes dos Isquerdo, Aparecida Negri |
author_role |
author |
author2 |
Isquerdo, Aparecida Negri |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Santos Junior, Jorge Luiz Nunes dos Isquerdo, Aparecida Negri |
dc.subject.por.fl_str_mv |
Lexicografia Dialetal Linguística Computacional Banco de dados em XML PLN Dialectal Lexicography Computational Linguistics XML database NLP |
topic |
Lexicografia Dialetal Linguística Computacional Banco de dados em XML PLN Dialectal Lexicography Computational Linguistics XML database NLP |
description |
This paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-09-12 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444 10.14393/DL52-v16n4a2022-11 |
url |
https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444 |
identifier_str_mv |
10.14393/DL52-v16n4a2022-11 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/33948 https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/35236 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdo http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdo http://creativecommons.org/licenses/by-nc-nd/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf text/xml |
dc.publisher.none.fl_str_mv |
PP/UFU |
publisher.none.fl_str_mv |
PP/UFU |
dc.source.none.fl_str_mv |
Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1544-1570 Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1544-1570 Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1544-1570 1980-5799 reponame:Domínios de Lingu@gem instname:Universidade Federal de Uberlândia (UFU) instacron:UFU |
instname_str |
Universidade Federal de Uberlândia (UFU) |
instacron_str |
UFU |
institution |
UFU |
reponame_str |
Domínios de Lingu@gem |
collection |
Domínios de Lingu@gem |
repository.name.fl_str_mv |
Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU) |
repository.mail.fl_str_mv |
revistadominios@ileel.ufu.br|| |
_version_ |
1797067717695504384 |