The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)

Detalhes bibliográficos
Autor(a) principal: Santos Junior, Jorge Luiz Nunes dos
Data de Publicação: 2022
Outros Autores: Isquerdo, Aparecida Negri
Tipo de documento: Artigo
Idioma: por
Título da fonte: Domínios de Lingu@gem
Texto Completo: https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444
Resumo: This paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support.
id UFU-12_e4468729a0df1d331917c7c0a95a300d
oai_identifier_str oai:ojs.www.seer.ufu.br:article/63444
network_acronym_str UFU-12
network_name_str Domínios de Lingu@gem
repository_id_str
spelling The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)A construção de um banco de dados lexicográfico em XML a partir de dados dialetais: o Processamento Automático de Linguagem Natural (PLN)Lexicografia DialetalLinguística ComputacionalBanco de dados em XMLPLNDialectal LexicographyComputational LinguisticsXML databaseNLPThis paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support.Este artigo situa-se na interface entre a Lexicografia (PORTO DAPENA, 2002; HARTMANN, 2016), a Dialetologia (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) e a Linguística Computacional (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). Objetiva-se discutir a proposta de construção de um banco de dados em XML (Extensible Markup Language), explorando os resultados obtidos com o PLN (Processamento Automático de Linguagem Natural).  O arquivo XML também se fundamenta em parâmetros da Lexicografia Dialetal (EZQUERRA, 1997; NAVARRO CARRASCO, 1993) e está sendo alimentado com dados dialetais oriundos do Projeto Atlas Linguístico do Brasil (ALiB) documentados na região Norte do país. Para tanto, utilizou-se como editor de texto o software jEdit e, para gerenciar o banco de dados, o programa BaseX. A extração das informações linguísticas foi realizada, no BaseX, a partir de uma amostra de dados e com o auxílio de expressões X-Query. Assim, foram executadas as seguintes manipulações de dados: i) localização de uma unidade lexical específica; ii) visualização de qualquer dado da microestrutura filtrada pelas variáveis sexo, idade, escolaridade e localidade; iii) seleção de informações a partir de uma das 14 áreas semânticas em que as questões do questionário semântico-lexical do ALiB foram organizadas. Em síntese, entende-se que a construção do banco de dados em XML confere agilidade em relação à extração de informações e compatibilidade dos dados para executar interfaces com outras aplicações como, por exemplo, a elaboração de um produto lexicográfico a ser publicado em suporte on-line.PP/UFU2022-09-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/xmlhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/6344410.14393/DL52-v16n4a2022-11Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1544-1570Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1544-1570Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1544-15701980-5799reponame:Domínios de Lingu@geminstname:Universidade Federal de Uberlândia (UFU)instacron:UFUporhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/33948https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/35236Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdohttp://creativecommons.org/licenses/by-nc-nd/4.0info:eu-repo/semantics/openAccess Santos Junior, Jorge Luiz Nunes dosIsquerdo, Aparecida Negri2022-12-09T18:25:24Zoai:ojs.www.seer.ufu.br:article/63444Revistahttps://seer.ufu.br/index.php/dominiosdelinguagemPUBhttps://seer.ufu.br/index.php/dominiosdelinguagem/oairevistadominios@ileel.ufu.br||1980-57991980-5799opendoar:2022-12-09T18:25:24Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
A construção de um banco de dados lexicográfico em XML a partir de dados dialetais: o Processamento Automático de Linguagem Natural (PLN)
title The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
spellingShingle The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
Santos Junior, Jorge Luiz Nunes dos
Lexicografia Dialetal
Linguística Computacional
Banco de dados em XML
PLN
Dialectal Lexicography
Computational Linguistics
XML database
NLP
title_short The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
title_full The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
title_fullStr The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
title_full_unstemmed The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
title_sort The construction of a lexicographic database in XML from dialectal data: the Natural Language Processing (NLP)
author Santos Junior, Jorge Luiz Nunes dos
author_facet Santos Junior, Jorge Luiz Nunes dos
Isquerdo, Aparecida Negri
author_role author
author2 Isquerdo, Aparecida Negri
author2_role author
dc.contributor.author.fl_str_mv Santos Junior, Jorge Luiz Nunes dos
Isquerdo, Aparecida Negri
dc.subject.por.fl_str_mv Lexicografia Dialetal
Linguística Computacional
Banco de dados em XML
PLN
Dialectal Lexicography
Computational Linguistics
XML database
NLP
topic Lexicografia Dialetal
Linguística Computacional
Banco de dados em XML
PLN
Dialectal Lexicography
Computational Linguistics
XML database
NLP
description This paper is situed at the interface between Lexicography (PORTO DAPENA, 2002; HARTMANN, 2016), Dialectology (CARDOSO, 2010; CHAMBERS; THUDGILL, 1994) and Computational Linguistics (HABERT, 2004; PÉREZ HERNÁNDEZ; MORENO ORTIZ, 2009; HAUSSER, 2014; KURDI, 2016). The objective is to discuss the proposal of building a database in XML (Extensible Markup Language), exploring the results obtained with NLP (Natural Language Processing). The XML file is also based on parameters of Dialectal Lexicography (ESQUERRA, 1997; NAVARRO CARRASCO, 1993) and is being fed with dialectal data from the project Atlas Linguístico do Brasil (ALiB) documented in the country's Northern region. Therefore, the jEdit software was used as a text editor and, to manage the database, the BaseX program. The linguistic information extraction was performed in the BaseX, from a sample of data and with the X-Query expressions support. Thus, the following data manipulations were performed: i) location of a specific lexical unit; ii) visualization of any microstructure data filtered by variables gender, age, education and location; iii) selection of information from one of the 14 semantic areas in which the questions of the ALiB semantic-lexical questionnaire were organized. In summary, it is understands that the construction of a XML database provides agility in concerning the information extraction and data compatibility to implement interfaces with another applications, for example, the development of a lexicographic product to be published in online support.
publishDate 2022
dc.date.none.fl_str_mv 2022-09-12
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444
10.14393/DL52-v16n4a2022-11
url https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444
identifier_str_mv 10.14393/DL52-v16n4a2022-11
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/33948
https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63444/35236
dc.rights.driver.fl_str_mv Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdo
http://creativecommons.org/licenses/by-nc-nd/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2022 Jorge Luiz Nunes dos Santos Junior, Aparecida Negri Isquerdo
http://creativecommons.org/licenses/by-nc-nd/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
text/xml
dc.publisher.none.fl_str_mv PP/UFU
publisher.none.fl_str_mv PP/UFU
dc.source.none.fl_str_mv Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1544-1570
Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1544-1570
Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1544-1570
1980-5799
reponame:Domínios de Lingu@gem
instname:Universidade Federal de Uberlândia (UFU)
instacron:UFU
instname_str Universidade Federal de Uberlândia (UFU)
instacron_str UFU
institution UFU
reponame_str Domínios de Lingu@gem
collection Domínios de Lingu@gem
repository.name.fl_str_mv Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv revistadominios@ileel.ufu.br||
_version_ 1797067717695504384