Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms

Detalhes bibliográficos
Autor(a) principal: Alencar, Leonel Figueiredo de
Data de Publicação: 2021
Tipo de documento: Artigo
Idioma: por
Título da fonte: Calidoscópio (Online)
Texto Completo: https://revistas.unisinos.br/index.php/calidoscopio/article/view/4874
Resumo: This paper presents LEXPOR, a prototype of a morphological component of Portuguese capable of segmenting and classifying the constituents of complex words resulting from suffixation of -ismo, -iano, -ês and -mente as well as from prefixing the words so derived with Greek or Latin prefixes such as neo-, pseudo-, anti-, or ultra-. We assume that a representation of complex words in terms of morphemes and morphosyntactic categories plays an important role not only in corpus linguistics, but also in other subfields of text technology, such as Information Extraction and Information Retrieval. This prototype consists of a lexical transducer modeling the set of words that can potentially be built using these derivational affixes. This transducer was compiled from a morphotactics and morphophonological description of this lexicon fragment as well as orthographic alternation rules formalized in the xfst and lexc finite-state programming languages. Its main feature is the ability to analyze neologisms built from non-lexicalized words borrowed from other languages. Since the use of foreign anthroponyms is one of the main causes of the extreme productivity of the derivational affixes we focus on, LEXPOR provides an adequate architecture for developing an automatic tagger for Portuguese, capable of overcoming the shortcomings of the CETENFolha corpus and of the parser for the VISL project. In both these cases, morphological analyses of complex words formed with the derivational affixes mentioned above are often either insufficiently detailed or simply incorrect. Key words: derivation, suffixation, prefixation, automata, lexical transducers, finite-state morphology, automatic corpus annotation, corpus linguistics, computational linguistics.
id Unisinos-3_df7a5cf07105322bacd99de1c21574d6
oai_identifier_str oai:ojs2.revistas.unisinos.br:article/4874
network_acronym_str Unisinos-3
network_name_str Calidoscópio (Online)
repository_id_str
spelling Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologismsProdutividade morfológica e tecnologia do texto: aspectos da construção de um transdutor lexical do português capaz de analisar neologismosThis paper presents LEXPOR, a prototype of a morphological component of Portuguese capable of segmenting and classifying the constituents of complex words resulting from suffixation of -ismo, -iano, -ês and -mente as well as from prefixing the words so derived with Greek or Latin prefixes such as neo-, pseudo-, anti-, or ultra-. We assume that a representation of complex words in terms of morphemes and morphosyntactic categories plays an important role not only in corpus linguistics, but also in other subfields of text technology, such as Information Extraction and Information Retrieval. This prototype consists of a lexical transducer modeling the set of words that can potentially be built using these derivational affixes. This transducer was compiled from a morphotactics and morphophonological description of this lexicon fragment as well as orthographic alternation rules formalized in the xfst and lexc finite-state programming languages. Its main feature is the ability to analyze neologisms built from non-lexicalized words borrowed from other languages. Since the use of foreign anthroponyms is one of the main causes of the extreme productivity of the derivational affixes we focus on, LEXPOR provides an adequate architecture for developing an automatic tagger for Portuguese, capable of overcoming the shortcomings of the CETENFolha corpus and of the parser for the VISL project. In both these cases, morphological analyses of complex words formed with the derivational affixes mentioned above are often either insufficiently detailed or simply incorrect. Key words: derivation, suffixation, prefixation, automata, lexical transducers, finite-state morphology, automatic corpus annotation, corpus linguistics, computational linguistics.Neste artigo, apresentamos o LEXPOR, protótipo de um componente morfológico do português capaz de segmentar e classificar os constituintes de derivados por meio da sufixação de -ismo, -iano, -ês e -mente bem como de derivados desses por prefixação com elementos de origem grega ou latina do tipo de neo-, pseudo-, anti- ou ultra-. Partimos do pressuposto de que uma representação das palavras complexas em termos de morfemas e categorias morfossintáticas não é só relevante na linguística de corpus, mas também em outras subáreas da tecnologia do texto, como a extração e a recuperação de informações. Este protótipo consiste de um transdutor lexical que modela o conjunto de palavras que se podem potencialmente construir usando esses afixos derivacionais. Esse transdutor foi compilado a partir de uma descrição da morfotática e das regras de alternância morfofonológicas e ortográficas desse fragmento do léxico, formalizada nas linguagens de programação de estados finitos xfst e lexc. A principal característica desse transdutor é a capacidade de realizar análises de neologismos construídos a partir de bases não lexicalizadas, tomadas de empréstimo de outras línguas. Como a utilização de antropônimos estrangeiros é uma das causas principais da extrema produtividade dos afixos derivacionais que focamos, o LEXPOR oferece uma arquitetura adequada para o desenvolvimento de um anotador automático de corpora do português capaz de preencher as lacunas de corpora como o CETENFolha e do analisador automático do projeto VISL. Em um como outro caso, as análises morfológicas de palavras complexas com os afixos derivacionais referidos frequentemente são insuficientemente detalhadas ou simplesmente incorretas. Palavras-chave: derivação, sufixação, prefixação, autômatos, transdutores lexicais, morfologia de estados finitos, anotação automática de corpora, linguística computacional, linguística de corpus.Unisinos2021-05-27info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/4874Calidoscópio; Vol. 7 No. 3 (2009): September/December; 199-220Calidoscópio; v. 7 n. 3 (2009): Setembro/Dezembro; 199-2202177-6202reponame:Calidoscópio (Online)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:Unisinosporhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/4874/2129Copyright (c) 2021 Calidoscópioinfo:eu-repo/semantics/openAccessAlencar, Leonel Figueiredo de2021-05-27T19:43:51Zoai:ojs2.revistas.unisinos.br:article/4874Revistahttps://revistas.unisinos.br/index.php/calidoscopioPUBhttps://revistas.unisinos.br/index.php/calidoscopio/oaicmira@unisinos.br || cmira@unisinos.br2177-62022177-6202opendoar:2021-05-27T19:43:51Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)false
dc.title.none.fl_str_mv Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
Produtividade morfológica e tecnologia do texto: aspectos da construção de um transdutor lexical do português capaz de analisar neologismos
title Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
spellingShingle Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
Alencar, Leonel Figueiredo de
title_short Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
title_full Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
title_fullStr Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
title_full_unstemmed Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
title_sort Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
author Alencar, Leonel Figueiredo de
author_facet Alencar, Leonel Figueiredo de
author_role author
dc.contributor.author.fl_str_mv Alencar, Leonel Figueiredo de
description This paper presents LEXPOR, a prototype of a morphological component of Portuguese capable of segmenting and classifying the constituents of complex words resulting from suffixation of -ismo, -iano, -ês and -mente as well as from prefixing the words so derived with Greek or Latin prefixes such as neo-, pseudo-, anti-, or ultra-. We assume that a representation of complex words in terms of morphemes and morphosyntactic categories plays an important role not only in corpus linguistics, but also in other subfields of text technology, such as Information Extraction and Information Retrieval. This prototype consists of a lexical transducer modeling the set of words that can potentially be built using these derivational affixes. This transducer was compiled from a morphotactics and morphophonological description of this lexicon fragment as well as orthographic alternation rules formalized in the xfst and lexc finite-state programming languages. Its main feature is the ability to analyze neologisms built from non-lexicalized words borrowed from other languages. Since the use of foreign anthroponyms is one of the main causes of the extreme productivity of the derivational affixes we focus on, LEXPOR provides an adequate architecture for developing an automatic tagger for Portuguese, capable of overcoming the shortcomings of the CETENFolha corpus and of the parser for the VISL project. In both these cases, morphological analyses of complex words formed with the derivational affixes mentioned above are often either insufficiently detailed or simply incorrect. Key words: derivation, suffixation, prefixation, automata, lexical transducers, finite-state morphology, automatic corpus annotation, corpus linguistics, computational linguistics.
publishDate 2021
dc.date.none.fl_str_mv 2021-05-27
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/4874
url https://revistas.unisinos.br/index.php/calidoscopio/article/view/4874
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/4874/2129
dc.rights.driver.fl_str_mv Copyright (c) 2021 Calidoscópio
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2021 Calidoscópio
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Unisinos
publisher.none.fl_str_mv Unisinos
dc.source.none.fl_str_mv Calidoscópio; Vol. 7 No. 3 (2009): September/December; 199-220
Calidoscópio; v. 7 n. 3 (2009): Setembro/Dezembro; 199-220
2177-6202
reponame:Calidoscópio (Online)
instname:Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron:Unisinos
instname_str Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron_str Unisinos
institution Unisinos
reponame_str Calidoscópio (Online)
collection Calidoscópio (Online)
repository.name.fl_str_mv Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)
repository.mail.fl_str_mv cmira@unisinos.br || cmira@unisinos.br
_version_ 1792203885668990976