An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese

Duran, Magali; Nunes, Maria das Graças Volpe; Lopes, Lucelene; Pardo, Thiago Alexandre Salgueiro

An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese

Detalhes bibliográficos
Autor(a) principal:	Duran, Magali
Data de Publicação:	2022
Outros Autores:	Nunes, Maria das Graças Volpe, Lopes, Lucelene, Pardo, Thiago Alexandre Salgueiro
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Domínios de Lingu@gem
DOI:	10.14393/DL52-v16n4a2022-13
Texto Completo:	https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632
Resumo:	With the advances of the Natural Language Processing area, corpora are resources that have had a prominent place. More than subsidizing linguistic studies, they constitute the basis for training Machine Learning models and developing cutting-edge computational applications. In particular, there is a great need for annotated corpora, but their production requires another essential resource, the annotation manual, which instantiates the annotation model of interest for the language in question and outlines the annotation decisions that should be adopted. In this paper, we explore issues related to the development of manuals for the annotation of Brazilian Portuguese corpora according to the Universal Dependencies model, widely adopted in the field. We discuss the evolution of NLP and the use of corpora, the fundamental issues, resources and tools related to syntactic representation, the Universal Dependencies model, and the main decisions made in the instantiation of UD guidelines in Brazilian Portuguese. For practical and didactic reasons, we divided the manual into two parts: the PoS Tag Annotation Manual (morphosyntactic annotation) and the Dependency Relations Annotation Manual. Both resulted from the process reported in this paper and are available for free access on the POeTiSA project's Web site.

Metadados do item

id	UFU-12_8854a04c9ecfaa71b63dd41a88396c5c
oai_identifier_str	oai:ojs.www.seer.ufu.br:article/63632
network_acronym_str	UFU-12
network_name_str	Domínios de Lingu@gem
spelling	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in PortugueseManual de anotação como recurso de Processamento de Linguagem Natural: o modelo Universal Dependencies em língua portuguesaCorpora anotadosManual de anotaçãoUniversal DependenciesÁrvores de dependênciaPortuguês brasileiroAnnotated corporaAnnotation manualUniversal DependenciesDependency treesBrazilian PortugueseWith the advances of the Natural Language Processing area, corpora are resources that have had a prominent place. More than subsidizing linguistic studies, they constitute the basis for training Machine Learning models and developing cutting-edge computational applications. In particular, there is a great need for annotated corpora, but their production requires another essential resource, the annotation manual, which instantiates the annotation model of interest for the language in question and outlines the annotation decisions that should be adopted. In this paper, we explore issues related to the development of manuals for the annotation of Brazilian Portuguese corpora according to the Universal Dependencies model, widely adopted in the field. We discuss the evolution of NLP and the use of corpora, the fundamental issues, resources and tools related to syntactic representation, the Universal Dependencies model, and the main decisions made in the instantiation of UD guidelines in Brazilian Portuguese. For practical and didactic reasons, we divided the manual into two parts: the PoS Tag Annotation Manual (morphosyntactic annotation) and the Dependency Relations Annotation Manual. Both resulted from the process reported in this paper and are available for free access on the POeTiSA project's Web site.Com o avanço da área de Processamento de Linguagem Natural (PLN), corpora são recursos que têm tido um lugar de destaque. Mais do que subsidiar estudos linguísticos, eles constituem as bases para o treinamento de modelos de Aprendizagem de Máquina e para o desenvolvimento de aplicações computacionais de ponta. Particularmente, há grande necessidade de corpora anotados, porém sua geração requer outro recurso essencial, o manual de anotação, que instancia o modelo de anotação de interesse para a língua em questão e delineia as decisões de anotação que devem ser adotadas. Neste artigo, exploramos questões relacionadas ao desenvolvimento de manuais para a anotação de corpus em português brasileiro segundo o modelo internacional Universal Dependencies, amplamente adotado na área. Partimos da discussão da evolução do PLN e o uso de corpora, passamos pelas questões, recursos e ferramentas fundamentais relacionados à representação sintática, discutimos o modelo Universal Dependencies e apresentamos as principais decisões tomadas na instanciação de suas diretrizes no português brasileiro. Por questões práticas e de didática, dividimos o manual em duas partes: o Manual de Anotação de PoS tags (anotação morfossintática) e o Manual de Anotação Relações de Dependência. Ambos foram resultado do processo relatado neste artigo e estão disponíveis para livre acesso no site do projeto POeTiSA na Web.PP/UFU2022-09-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdftext/xmlhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/6363210.14393/DL52-v16n4a2022-13Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1608-1643Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1608-1643Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1608-16431980-5799reponame:Domínios de Lingu@geminstname:Universidade Federal de Uberlândia (UFU)instacron:UFUporhttps://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632/34631https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632/35238Copyright (c) 2022 Magali Duran, Maria das Graças Volpe Nunes, Lucelene Lopes, Thiago Alexandre Salgueiro Pardohttp://creativecommons.org/licenses/by-nc-nd/4.0info:eu-repo/semantics/openAccessDuran, MagaliNunes, Maria das Graças VolpeLopes, LucelenePardo, Thiago Alexandre Salgueiro2022-12-09T18:36:45Zoai:ojs.www.seer.ufu.br:article/63632Revistahttps://seer.ufu.br/index.php/dominiosdelinguagemPUBhttps://seer.ufu.br/index.php/dominiosdelinguagem/oairevistadominios@ileel.ufu.br\|\|1980-57991980-5799opendoar:2022-12-09T18:36:45Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese Manual de anotação como recurso de Processamento de Linguagem Natural: o modelo Universal Dependencies em língua portuguesa
title	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
spellingShingle	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese Duran, Magali Corpora anotados Manual de anotação Universal Dependencies Árvores de dependência Português brasileiro Annotated corpora Annotation manual Universal Dependencies Dependency trees Brazilian Portuguese Duran, Magali Corpora anotados Manual de anotação Universal Dependencies Árvores de dependência Português brasileiro Annotated corpora Annotation manual Universal Dependencies Dependency trees Brazilian Portuguese
title_short	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
title_full	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
title_fullStr	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
title_full_unstemmed	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
title_sort	An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese
author	Duran, Magali
author_facet	Duran, Magali Duran, Magali Nunes, Maria das Graças Volpe Lopes, Lucelene Pardo, Thiago Alexandre Salgueiro Nunes, Maria das Graças Volpe Lopes, Lucelene Pardo, Thiago Alexandre Salgueiro
author_role	author
author2	Nunes, Maria das Graças Volpe Lopes, Lucelene Pardo, Thiago Alexandre Salgueiro
author2_role	author author author
dc.contributor.author.fl_str_mv	Duran, Magali Nunes, Maria das Graças Volpe Lopes, Lucelene Pardo, Thiago Alexandre Salgueiro
dc.subject.por.fl_str_mv	Corpora anotados Manual de anotação Universal Dependencies Árvores de dependência Português brasileiro Annotated corpora Annotation manual Universal Dependencies Dependency trees Brazilian Portuguese
topic	Corpora anotados Manual de anotação Universal Dependencies Árvores de dependência Português brasileiro Annotated corpora Annotation manual Universal Dependencies Dependency trees Brazilian Portuguese
description	With the advances of the Natural Language Processing area, corpora are resources that have had a prominent place. More than subsidizing linguistic studies, they constitute the basis for training Machine Learning models and developing cutting-edge computational applications. In particular, there is a great need for annotated corpora, but their production requires another essential resource, the annotation manual, which instantiates the annotation model of interest for the language in question and outlines the annotation decisions that should be adopted. In this paper, we explore issues related to the development of manuals for the annotation of Brazilian Portuguese corpora according to the Universal Dependencies model, widely adopted in the field. We discuss the evolution of NLP and the use of corpora, the fundamental issues, resources and tools related to syntactic representation, the Universal Dependencies model, and the main decisions made in the instantiation of UD guidelines in Brazilian Portuguese. For practical and didactic reasons, we divided the manual into two parts: the PoS Tag Annotation Manual (morphosyntactic annotation) and the Dependency Relations Annotation Manual. Both resulted from the process reported in this paper and are available for free access on the POeTiSA project's Web site.
publishDate	2022
dc.date.none.fl_str_mv	2022-09-12
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632 10.14393/DL52-v16n4a2022-13
url	https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632
identifier_str_mv	10.14393/DL52-v16n4a2022-13
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632/34631 https://seer.ufu.br/index.php/dominiosdelinguagem/article/view/63632/35238
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf text/xml
dc.publisher.none.fl_str_mv	PP/UFU
publisher.none.fl_str_mv	PP/UFU
dc.source.none.fl_str_mv	Domínios de Lingu@gem; Vol. 16 No. 4 (2022): The computational treatment of Brazilian Portuguese; 1608-1643 Domínios de Lingu@gem; Vol. 16 Núm. 4 (2022): El tratamiento computacional del portugués brasileño; 1608-1643 Domínios de Lingu@gem; v. 16 n. 4 (2022): Tratamento Computacional do Português Brasileiro; 1608-1643 1980-5799 reponame:Domínios de Lingu@gem instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Domínios de Lingu@gem
collection	Domínios de Lingu@gem
repository.name.fl_str_mv	Domínios de Lingu@gem - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	revistadominios@ileel.ufu.br\|\|
_version_	1822178852097490944
dc.identifier.doi.none.fl_str_mv	10.14393/DL52-v16n4a2022-13

An annotation manual as a Natural Language Processing resource: the Universal Dependencies model in Portuguese

Registros relacionados