Syntactic parser for Brazilian Portuguese: challenges and solutions

Detalhes bibliográficos
Autor(a) principal: Pacheco, Willian Emerson Afonso
Data de Publicação: 2022
Outros Autores: Guaranha, Manoel Francisco
Tipo de documento: Artigo
Idioma: por
Título da fonte: Texto livre
Texto Completo: https://periodicos.ufmg.br/index.php/textolivre/article/view/37569
Resumo: This article aims to present the Syntactic Parser for Brazilian Portuguese – Parsero –, developed from the Generative Grammar (CHOMSKY, 2015) improved by the X-Barra Theory (CHOMSKY, 2014). Therefore, the rules developed by Othero (2009) especially for Brazilian Portuguese were used and adapted by our project to meet the needs of our Parser. The research used as lexical collection, to populate a Structured Query Language (SQL) Database, the resource Dictionary of Simple Inflected Words for Brazilian Portuguese (DELAF_PB), which was made available available by the Unitex-PB Project, developed by Núcleo Interinstitucional de Linguística Computacional (NILC) and by Instituto de Ciências Matemáticas e de Computação (ICMC). This resource, in turn, was built based on the French formalism – Dictionnarie Electronique du LADL (DELA) (MUNIZ, 2004). As a result of our project, we have made available to researchers interested in the topic the SQL Database with 1,193,295 classified lexical units, the address with the open source of Parsero and a link to run the application. Throughout the development of the Natural Language Processor (NLP), we had to put into practice interdisciplinary studies from language sciences and computer sciences, a necessary practice for the development of intelligent programs that can interact with writers or Brazilian Portuguese speakers.
id UFMG-9_12ea958ce12d2190722698243b5a32bd
oai_identifier_str oai:periodicos.ufmg.br:article/37569
network_acronym_str UFMG-9
network_name_str Texto livre
repository_id_str
spelling Syntactic parser for Brazilian Portuguese: challenges and solutionsParser sintático para o português brasileiro: desafios e soluçõesLinguística computacionalProcessamento de Linguagem NaturalGramática gerativaParser sintáticoPortuguês brasileiroComputational linguisticsNatural Language ProcessingGenerative GrammarSyntactic parserBrazilian PortugueseThis article aims to present the Syntactic Parser for Brazilian Portuguese – Parsero –, developed from the Generative Grammar (CHOMSKY, 2015) improved by the X-Barra Theory (CHOMSKY, 2014). Therefore, the rules developed by Othero (2009) especially for Brazilian Portuguese were used and adapted by our project to meet the needs of our Parser. The research used as lexical collection, to populate a Structured Query Language (SQL) Database, the resource Dictionary of Simple Inflected Words for Brazilian Portuguese (DELAF_PB), which was made available available by the Unitex-PB Project, developed by Núcleo Interinstitucional de Linguística Computacional (NILC) and by Instituto de Ciências Matemáticas e de Computação (ICMC). This resource, in turn, was built based on the French formalism – Dictionnarie Electronique du LADL (DELA) (MUNIZ, 2004). As a result of our project, we have made available to researchers interested in the topic the SQL Database with 1,193,295 classified lexical units, the address with the open source of Parsero and a link to run the application. Throughout the development of the Natural Language Processor (NLP), we had to put into practice interdisciplinary studies from language sciences and computer sciences, a necessary practice for the development of intelligent programs that can interact with writers or Brazilian Portuguese speakers.Este artigo tem como objetivo apresentar o Parser Sintático para o Português Brasileiro – Parsero, desenvolvido a partir da Gramática Gerativa (CHOMSKY, 2015), aperfeiçoada pela Teoria X-Barra (CHOMSKY, 2014). Para tanto, foram utilizadas as regras desenvolvidas especialmente para o Português Brasileiro por Othero (2009) e adaptadas pelo nosso projeto para atender às necessidades de nosso Parser. A pesquisa utilizou como coleção lexical, para povoar um Banco de Dados Structured Query Language (SQL), o recurso Dicionário de Palavras Simples Flexionadas para o Português Brasileiro (DELAF_PB), disponibilizado pelo Projeto Unitex-PB, desenvolvido pelo Núcleo Interinstitucional de Linguística Computacional (NILC) e pelo Instituto de Ciências Matemáticas e de Computação (ICMC). Esse recurso, por sua vez, foi construído com base no formalismo francês – Dictionnarie Electronique du LADL (DELA) (MUNIZ, 2004). Como resultado, disponibilizamos a Base de Dados SQL com 1.193.295 unidades léxicas classificadas, o endereço com o código aberto do Parsero e um link para execução do aplicativo. Para desenvolver o Processador de Linguagem Natural (PLN), colocamos em prática estudos interdisciplinares em ciências da linguagem e ciências da computação, práticas necessárias para o desenvolvimento de programas inteligentes que consigam interagir com escritores e falantes do Português Brasileiro.  Universidade Federal de Minas Gerais2022-05-14info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArtigo avaliado pelos paresapplication/pdfhttps://periodicos.ufmg.br/index.php/textolivre/article/view/3756910.35699/1983-3652.2022.37569Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569Texto Livre; v. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e375691983-3652reponame:Texto livreinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGporhttps://periodicos.ufmg.br/index.php/textolivre/article/view/37569/30502Copyright (c) 2022 Willian Emerson Afonso Pacheco, Manoel Francisco Guaranhahttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessPacheco, Willian Emerson Afonso Guaranha, Manoel Francisco2022-10-31T13:32:07Zoai:periodicos.ufmg.br:article/37569Revistahttp://www.periodicos.letras.ufmg.br/index.php/textolivrePUBhttps://periodicos.ufmg.br/index.php/textolivre/oairevistatextolivre@letras.ufmg.br1983-36521983-3652opendoar:2022-10-31T13:32:07Texto livre - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv Syntactic parser for Brazilian Portuguese: challenges and solutions
Parser sintático para o português brasileiro: desafios e soluções
title Syntactic parser for Brazilian Portuguese: challenges and solutions
spellingShingle Syntactic parser for Brazilian Portuguese: challenges and solutions
Pacheco, Willian Emerson Afonso
Linguística computacional
Processamento de Linguagem Natural
Gramática gerativa
Parser sintático
Português brasileiro
Computational linguistics
Natural Language Processing
Generative Grammar
Syntactic parser
Brazilian Portuguese
title_short Syntactic parser for Brazilian Portuguese: challenges and solutions
title_full Syntactic parser for Brazilian Portuguese: challenges and solutions
title_fullStr Syntactic parser for Brazilian Portuguese: challenges and solutions
title_full_unstemmed Syntactic parser for Brazilian Portuguese: challenges and solutions
title_sort Syntactic parser for Brazilian Portuguese: challenges and solutions
author Pacheco, Willian Emerson Afonso
author_facet Pacheco, Willian Emerson Afonso
Guaranha, Manoel Francisco
author_role author
author2 Guaranha, Manoel Francisco
author2_role author
dc.contributor.author.fl_str_mv Pacheco, Willian Emerson Afonso
Guaranha, Manoel Francisco
dc.subject.por.fl_str_mv Linguística computacional
Processamento de Linguagem Natural
Gramática gerativa
Parser sintático
Português brasileiro
Computational linguistics
Natural Language Processing
Generative Grammar
Syntactic parser
Brazilian Portuguese
topic Linguística computacional
Processamento de Linguagem Natural
Gramática gerativa
Parser sintático
Português brasileiro
Computational linguistics
Natural Language Processing
Generative Grammar
Syntactic parser
Brazilian Portuguese
description This article aims to present the Syntactic Parser for Brazilian Portuguese – Parsero –, developed from the Generative Grammar (CHOMSKY, 2015) improved by the X-Barra Theory (CHOMSKY, 2014). Therefore, the rules developed by Othero (2009) especially for Brazilian Portuguese were used and adapted by our project to meet the needs of our Parser. The research used as lexical collection, to populate a Structured Query Language (SQL) Database, the resource Dictionary of Simple Inflected Words for Brazilian Portuguese (DELAF_PB), which was made available available by the Unitex-PB Project, developed by Núcleo Interinstitucional de Linguística Computacional (NILC) and by Instituto de Ciências Matemáticas e de Computação (ICMC). This resource, in turn, was built based on the French formalism – Dictionnarie Electronique du LADL (DELA) (MUNIZ, 2004). As a result of our project, we have made available to researchers interested in the topic the SQL Database with 1,193,295 classified lexical units, the address with the open source of Parsero and a link to run the application. Throughout the development of the Natural Language Processor (NLP), we had to put into practice interdisciplinary studies from language sciences and computer sciences, a necessary practice for the development of intelligent programs that can interact with writers or Brazilian Portuguese speakers.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-14
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Artigo avaliado pelos pares
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://periodicos.ufmg.br/index.php/textolivre/article/view/37569
10.35699/1983-3652.2022.37569
url https://periodicos.ufmg.br/index.php/textolivre/article/view/37569
identifier_str_mv 10.35699/1983-3652.2022.37569
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://periodicos.ufmg.br/index.php/textolivre/article/view/37569/30502
dc.rights.driver.fl_str_mv Copyright (c) 2022 Willian Emerson Afonso Pacheco, Manoel Francisco Guaranha
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2022 Willian Emerson Afonso Pacheco, Manoel Francisco Guaranha
https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569
Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569
Texto Livre; Vol. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569
Texto Livre; v. 15 (2022): Texto Livre: Linguagem e Tecnologia ; e37569
1983-3652
reponame:Texto livre
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Texto livre
collection Texto livre
repository.name.fl_str_mv Texto livre - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv revistatextolivre@letras.ufmg.br
_version_ 1799711143584858112