Linguistic description and machine learning: analysis of spanish locative verbs

Rodrigues, Roana; Souza, Jackson Wilke da Cruz; Santos, Roney Lira de Sales

Linguistic description and machine learning: analysis of spanish locative verbs

Detalhes bibliográficos
Autor(a) principal:	Rodrigues, Roana
Data de Publicação:	2022
Outros Autores:	Souza, Jackson Wilke da Cruz, Santos, Roney Lira de Sales
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Cadernos de Estudos Linguísticos
Texto Completo:	https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8666995
Resumo:	In order to explain the relations established between descriptive linguistics and machine learning, this article presents the results of a research that analyzes an algorithm generated based on the human classification of locative verbal constructions of the Spanish language. The data used in the investigation were taken from Rodrigues (2019), which presents a manual analysis and description of 318 instances that are constituted by verbs that necessarily select an argument interpreted as place (poner, salir, entrar, cage, etc.), distributed in 10 distinct classes, according to their structural, distributional and transformational properties. Based on the symbolic paradigm and using weka software, the data allowed the generation of two proposed rules of the JRip algorithm: without and with the selection of attributes. Both procedures generated 10 composite rules and evaluated the measurements of precision, coverage, f-measure and confusion matrix of the algorithms created. The algorithm without the selection of attributes presented 100% accuracy, demonstrating that the linguistic data present a coherent description and classification. The algorithm with the selection of attributes, having 96.54% accuracy, made it possible, in addition to exposing the most relevant linguistic properties for classification purposes, to analyze the most sensitive cases for class distinction, culminating in the survey of six descriptive aspects of review and/or refinement of data that should be analyzed in future linguistic studies. Thus, this investigation helped, more specifically, in the improvement of the description of the locative verbal constructions of the Spanish language and made it possible to demonstrate that the relationship between human description and machine learning is not only about the importance of description as an insum for the machine, but mainly on how it is possible to use algorithms (and their evaluation measures) to validate and improve the description of different phenomena of natural languages.

Metadados do item

id	UNICAMP-13_d7e3deb1e0df6695fa898d186bb165c3
oai_identifier_str	oai:ojs.periodicos.sbu.unicamp.br:article/8666995
network_acronym_str	UNICAMP-13
network_name_str	Cadernos de Estudos Linguísticos
repository_id_str
spelling	Linguistic description and machine learning: analysis of spanish locative verbsDescripción lingüística y aprendizaje automático: análisis de verbos locativos del españolDescrição linguística e aprendizado de máquina: análise de verbos locativos do espanholMachine learningSyntaxLexicon-grammarAprendizaje de máquinaSintaxisLéxico-gramáticaAprendizado de máquinaSintaxeLéxico-gramáticaIn order to explain the relations established between descriptive linguistics and machine learning, this article presents the results of a research that analyzes an algorithm generated based on the human classification of locative verbal constructions of the Spanish language. The data used in the investigation were taken from Rodrigues (2019), which presents a manual analysis and description of 318 instances that are constituted by verbs that necessarily select an argument interpreted as place (poner, salir, entrar, cage, etc.), distributed in 10 distinct classes, according to their structural, distributional and transformational properties. Based on the symbolic paradigm and using weka software, the data allowed the generation of two proposed rules of the JRip algorithm: without and with the selection of attributes. Both procedures generated 10 composite rules and evaluated the measurements of precision, coverage, f-measure and confusion matrix of the algorithms created. The algorithm without the selection of attributes presented 100% accuracy, demonstrating that the linguistic data present a coherent description and classification. The algorithm with the selection of attributes, having 96.54% accuracy, made it possible, in addition to exposing the most relevant linguistic properties for classification purposes, to analyze the most sensitive cases for class distinction, culminating in the survey of six descriptive aspects of review and/or refinement of data that should be analyzed in future linguistic studies. Thus, this investigation helped, more specifically, in the improvement of the description of the locative verbal constructions of the Spanish language and made it possible to demonstrate that the relationship between human description and machine learning is not only about the importance of description as an insum for the machine, but mainly on how it is possible to use algorithms (and their evaluation measures) to validate and improve the description of different phenomena of natural languages.Con el fin de esclarecer las relaciones que se establecen entre la lingüística descriptiva y el aprendizaje automático, este artículo presenta resultados de una investigación que analiza un algoritmo generado a partir de una propuesta de clasificación humana de construcciones verbales locativas de la lengua española. Se utilizaron datos sacados de Rodrigues (2019), que presentan un análisis y descripción de 318 construcciones verbales que seleccionan, de manera obligatoria, un argumento interpretado como lugar (poner, salir, entrar, enjaular etc.), organizadas en 10 clases distintas, de acuerdo con sus 49 atributos estructurales, distribucionales y transformacionales. Partiendo del paradigma simbólico y utilizando el software Weka, los datos permitieron generar dos propuestas de reglas del algoritmo JRip: sin y con la selección de atributos. Ambos los procedimientos generaron 10 reglas compuestas y evaluaron las medidas de precisión, exhaustividad, puntuación-f1 y matriz de confusión de los algoritmos creados. El algoritmo sin selección de atributos presentó el 100% de desempeño, demostrando que los datos lingüísticos presentan una descripción y clasificación coherentes. Por su vez, el algoritmo con selección de atributos, con el 96,54% de desempeño, permitió, además de exponer las propiedades lingüísticas más relevantes con fines de clasificación, analizar los casos más sensibles para distinción entre las clases, culminando en la lista de seis aspectos descriptivos de revisión y/o refinamiento de datos que se deben analizar en investigaciones futuras. Por tanto, esta investigación auxilió, más específicamente, en la mejora de la descripción de las construcciones verbales locativas de la lengua española y permitió demostrar que la relación descripción humana y aprendizaje automático no consiste solamente en la importancia de la descripción como input para la máquina, pero, principalmente, sobre cómo es posible utilizar algoritmos (y sus métricas de evaluación) para validar y mejorar la descripción de diferentes fenómenos de las lenguas naturales.Con el fin de esclarecer las relaciones que se establecen entre la lingüística descriptiva y el aprendizaje automático, este artículo presenta resultados de una investigación que analiza un algoritmo generado a partir de una propuesta de clasificación humana de construcciones verbales locativas de la lengua española. Se utilizaron datos sacados de Rodrigues (2019), que presentan un análisis y descripción de 318 construcciones verbales que seleccionan, de manera obligatoria, un argumento interpretado como lugar (poner, salir, entrar, enjaular etc.), organizadas en 10 clases distintas, de acuerdo con sus atributos estructurales, distribucionales y transformacionales. Partiendo del paradigma simbólico y utilizando el software Weka, los datos permitieron generar dos propuestas de reglas del algoritmo JRip: sin y con la selección de atributos. Ambos los procedimientos generaron 10 reglas compuestas y evaluaron las medidas de precisión, exhaustividad, puntuación-f1 y matriz de confusión de los algoritmos creados. El algoritmo sin selección de atributos presentó el 100% de desempeño, demostrando que los datos lingüísticos presentan una descripción y clasificación coherentes. Por su vez, el algoritmo con selección de atributos, con el 96,54% de desempeño, permitió, además de exponer las propiedades lingüísticas más relevantes con fines de clasificación, analizar los casos más sensibles para distinción entre las clases, culminando en la lista de seis aspectos descriptivos de revisión y/o refinamiento de datos que se deben analizar en investigaciones futuras. Por tanto, esta investigación auxilió, más específicamente, en la mejora de la descripción de las construcciones verbales locativas de la lengua española y demostró que la relación descripción humana y aprendizaje automático no consiste solamente en la importancia de la descripción como input para la máquina, pero, principalmente, sobre cómo es posible utilizar algoritmos (y sus métricas de evaluación) para validar y mejorar la descripción de diferentes fenómenos de las lenguas naturales.Universidade Estadual de Campinas2022-10-24info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionAvaliado pelos paresTextoTextoinfo:eu-repo/semantics/otherapplication/pdfhttps://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/866699510.20396/cel.v64i00.8666995Cadernos de Estudos Linguísticos; v. 64 (2022): Publicação Contínua; e022038Cadernos de Estudos Linguísticos; Vol. 64 (2022): Continous Publication; e022038Cadernos de Estudos Linguísticos; Vol. 64 (2022): Publicación continua; e0220382447-0686reponame:Cadernos de Estudos Linguísticosinstname:Universidade Estadual de Campinas (UNICAMP)instacron:UNICAMPporhttps://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8666995/30327Estudio Global; ContemporáneoEstudo Global; ContemporâneoGlobal Study; ContemporaryCopyright (c) 2022 Cadernos de Estudos Linguísticoshttps://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessRodrigues, RoanaSouza, Jackson Wilke da Cruz Santos, Roney Lira de Sales 2023-07-04T17:40:49Zoai:ojs.periodicos.sbu.unicamp.br:article/8666995Revistahttp://revistas.iel.unicamp.br/index.php/cel/PUBhttp://revistas.iel.unicamp.br/index.php/cel/oaispublic@iel.unicamp.br\|\|revistacel@iel.unicamp.br2447-06860102-5767opendoar:2023-07-04T17:40:49Cadernos de Estudos Linguísticos - Universidade Estadual de Campinas (UNICAMP)false
dc.title.none.fl_str_mv	Linguistic description and machine learning: analysis of spanish locative verbs Descripción lingüística y aprendizaje automático: análisis de verbos locativos del español Descrição linguística e aprendizado de máquina: análise de verbos locativos do espanhol
title	Linguistic description and machine learning: analysis of spanish locative verbs
spellingShingle	Linguistic description and machine learning: analysis of spanish locative verbs Rodrigues, Roana Machine learning Syntax Lexicon-grammar Aprendizaje de máquina Sintaxis Léxico-gramática Aprendizado de máquina Sintaxe Léxico-gramática
title_short	Linguistic description and machine learning: analysis of spanish locative verbs
title_full	Linguistic description and machine learning: analysis of spanish locative verbs
title_fullStr	Linguistic description and machine learning: analysis of spanish locative verbs
title_full_unstemmed	Linguistic description and machine learning: analysis of spanish locative verbs
title_sort	Linguistic description and machine learning: analysis of spanish locative verbs
author	Rodrigues, Roana
author_facet	Rodrigues, Roana Souza, Jackson Wilke da Cruz Santos, Roney Lira de Sales
author_role	author
author2	Souza, Jackson Wilke da Cruz Santos, Roney Lira de Sales
author2_role	author author
dc.contributor.author.fl_str_mv	Rodrigues, Roana Souza, Jackson Wilke da Cruz Santos, Roney Lira de Sales
dc.subject.por.fl_str_mv	Machine learning Syntax Lexicon-grammar Aprendizaje de máquina Sintaxis Léxico-gramática Aprendizado de máquina Sintaxe Léxico-gramática
topic	Machine learning Syntax Lexicon-grammar Aprendizaje de máquina Sintaxis Léxico-gramática Aprendizado de máquina Sintaxe Léxico-gramática
description	In order to explain the relations established between descriptive linguistics and machine learning, this article presents the results of a research that analyzes an algorithm generated based on the human classification of locative verbal constructions of the Spanish language. The data used in the investigation were taken from Rodrigues (2019), which presents a manual analysis and description of 318 instances that are constituted by verbs that necessarily select an argument interpreted as place (poner, salir, entrar, cage, etc.), distributed in 10 distinct classes, according to their structural, distributional and transformational properties. Based on the symbolic paradigm and using weka software, the data allowed the generation of two proposed rules of the JRip algorithm: without and with the selection of attributes. Both procedures generated 10 composite rules and evaluated the measurements of precision, coverage, f-measure and confusion matrix of the algorithms created. The algorithm without the selection of attributes presented 100% accuracy, demonstrating that the linguistic data present a coherent description and classification. The algorithm with the selection of attributes, having 96.54% accuracy, made it possible, in addition to exposing the most relevant linguistic properties for classification purposes, to analyze the most sensitive cases for class distinction, culminating in the survey of six descriptive aspects of review and/or refinement of data that should be analyzed in future linguistic studies. Thus, this investigation helped, more specifically, in the improvement of the description of the locative verbal constructions of the Spanish language and made it possible to demonstrate that the relationship between human description and machine learning is not only about the importance of description as an insum for the machine, but mainly on how it is possible to use algorithms (and their evaluation measures) to validate and improve the description of different phenomena of natural languages.
publishDate	2022
dc.date.none.fl_str_mv	2022-10-24
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Avaliado pelos pares Texto Texto info:eu-repo/semantics/other
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8666995 10.20396/cel.v64i00.8666995
url	https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8666995
identifier_str_mv	10.20396/cel.v64i00.8666995
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8666995/30327
dc.rights.driver.fl_str_mv	Copyright (c) 2022 Cadernos de Estudos Linguísticos https://creativecommons.org/licenses/by-nc/4.0 info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2022 Cadernos de Estudos Linguísticos https://creativecommons.org/licenses/by-nc/4.0
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv	Estudio Global; Contemporáneo Estudo Global; Contemporâneo Global Study; Contemporary
dc.publisher.none.fl_str_mv	Universidade Estadual de Campinas
publisher.none.fl_str_mv	Universidade Estadual de Campinas
dc.source.none.fl_str_mv	Cadernos de Estudos Linguísticos; v. 64 (2022): Publicação Contínua; e022038 Cadernos de Estudos Linguísticos; Vol. 64 (2022): Continous Publication; e022038 Cadernos de Estudos Linguísticos; Vol. 64 (2022): Publicación continua; e022038 2447-0686 reponame:Cadernos de Estudos Linguísticos instname:Universidade Estadual de Campinas (UNICAMP) instacron:UNICAMP
instname_str	Universidade Estadual de Campinas (UNICAMP)
instacron_str	UNICAMP
institution	UNICAMP
reponame_str	Cadernos de Estudos Linguísticos
collection	Cadernos de Estudos Linguísticos
repository.name.fl_str_mv	Cadernos de Estudos Linguísticos - Universidade Estadual de Campinas (UNICAMP)
repository.mail.fl_str_mv	spublic@iel.unicamp.br\|\|revistacel@iel.unicamp.br
_version_	1800216492574244864

Linguistic description and machine learning: analysis of spanish locative verbs

Registros relacionados