A new world in Floresta Sintá(c)tica – the Portuguese treebank

Detalhes bibliográficos
Autor(a) principal: Freitas, Claudia
Data de Publicação: 2021
Outros Autores: Rocha, Paulo, Bick, Eckhard
Tipo de documento: Artigo
Idioma: por
Título da fonte: Calidoscópio (Online)
Texto Completo: https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256
Resumo: Floresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool.
id Unisinos-3_e6b8e5ace39f322c43335cf49cf83da1
oai_identifier_str oai:ojs2.revistas.unisinos.br:article/5256
network_acronym_str Unisinos-3
network_name_str Calidoscópio (Online)
repository_id_str
spelling A new world in Floresta Sintá(c)tica – the Portuguese treebankUm mundo novo na Floresta Sintá(c)tica – o treebank do PortuguêsFloresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool.A Floresta Sintá(c)tica tem como objetivo criar e disponibilizar um corpus sintaticamente anotado. Neste artigo, são apresentados dois novos materiais do projeto: Selva (300 mil palavras e parcialmente revisto) e Amazônia (3.8 milhões de palavras, não revisto). Para lidar com um material tão grande e variado foi construída a interface Milhafre. O artigo mostra, ainda, como vem sendo enfrentado o desafio de compatibilizar, de uma lado, o usuário lingüista, que pode ter um perfil muito heterogêneo e, em geral, pouca familiaridade determinadas formalizações mais utilizadas em informática e, de outro, um único modelo de anotação sintática, freqüentemente pouco conhecido do lado “lingüístico não-computacional” e uma interface de acesso e manipulação de corpora capaz de lidar com um objeto tão complexo como a língua. Palavras-chave: árvores sintáticas, corpus anotado, corpus revisto, busca em corpora.Unisinos2021-05-27info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/5256Calidoscópio; Vol. 6 No. 3 (2008): September/December; 142-148Calidoscópio; v. 6 n. 3 (2008): Setembro/Dezembro; 142-1482177-6202reponame:Calidoscópio (Online)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:Unisinosporhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/5256/2510Copyright (c) 2021 Calidoscópioinfo:eu-repo/semantics/openAccessFreitas, ClaudiaRocha, PauloBick, Eckhard2021-05-27T19:17:40Zoai:ojs2.revistas.unisinos.br:article/5256Revistahttps://revistas.unisinos.br/index.php/calidoscopioPUBhttps://revistas.unisinos.br/index.php/calidoscopio/oaicmira@unisinos.br || cmira@unisinos.br2177-62022177-6202opendoar:2021-05-27T19:17:40Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)false
dc.title.none.fl_str_mv A new world in Floresta Sintá(c)tica – the Portuguese treebank
Um mundo novo na Floresta Sintá(c)tica – o treebank do Português
title A new world in Floresta Sintá(c)tica – the Portuguese treebank
spellingShingle A new world in Floresta Sintá(c)tica – the Portuguese treebank
Freitas, Claudia
title_short A new world in Floresta Sintá(c)tica – the Portuguese treebank
title_full A new world in Floresta Sintá(c)tica – the Portuguese treebank
title_fullStr A new world in Floresta Sintá(c)tica – the Portuguese treebank
title_full_unstemmed A new world in Floresta Sintá(c)tica – the Portuguese treebank
title_sort A new world in Floresta Sintá(c)tica – the Portuguese treebank
author Freitas, Claudia
author_facet Freitas, Claudia
Rocha, Paulo
Bick, Eckhard
author_role author
author2 Rocha, Paulo
Bick, Eckhard
author2_role author
author
dc.contributor.author.fl_str_mv Freitas, Claudia
Rocha, Paulo
Bick, Eckhard
description Floresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool.
publishDate 2021
dc.date.none.fl_str_mv 2021-05-27
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256
url https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256/2510
dc.rights.driver.fl_str_mv Copyright (c) 2021 Calidoscópio
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2021 Calidoscópio
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Unisinos
publisher.none.fl_str_mv Unisinos
dc.source.none.fl_str_mv Calidoscópio; Vol. 6 No. 3 (2008): September/December; 142-148
Calidoscópio; v. 6 n. 3 (2008): Setembro/Dezembro; 142-148
2177-6202
reponame:Calidoscópio (Online)
instname:Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron:Unisinos
instname_str Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron_str Unisinos
institution Unisinos
reponame_str Calidoscópio (Online)
collection Calidoscópio (Online)
repository.name.fl_str_mv Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)
repository.mail.fl_str_mv cmira@unisinos.br || cmira@unisinos.br
_version_ 1792203885708836864