A new world in Floresta Sintá(c)tica – the Portuguese treebank
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Calidoscópio (Online) |
Texto Completo: | https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256 |
Resumo: | Floresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool. |
id |
Unisinos-3_e6b8e5ace39f322c43335cf49cf83da1 |
---|---|
oai_identifier_str |
oai:ojs2.revistas.unisinos.br:article/5256 |
network_acronym_str |
Unisinos-3 |
network_name_str |
Calidoscópio (Online) |
repository_id_str |
|
spelling |
A new world in Floresta Sintá(c)tica – the Portuguese treebankUm mundo novo na Floresta Sintá(c)tica – o treebank do PortuguêsFloresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool.A Floresta Sintá(c)tica tem como objetivo criar e disponibilizar um corpus sintaticamente anotado. Neste artigo, são apresentados dois novos materiais do projeto: Selva (300 mil palavras e parcialmente revisto) e Amazônia (3.8 milhões de palavras, não revisto). Para lidar com um material tão grande e variado foi construída a interface Milhafre. O artigo mostra, ainda, como vem sendo enfrentado o desafio de compatibilizar, de uma lado, o usuário lingüista, que pode ter um perfil muito heterogêneo e, em geral, pouca familiaridade determinadas formalizações mais utilizadas em informática e, de outro, um único modelo de anotação sintática, freqüentemente pouco conhecido do lado “lingüístico não-computacional” e uma interface de acesso e manipulação de corpora capaz de lidar com um objeto tão complexo como a língua. Palavras-chave: árvores sintáticas, corpus anotado, corpus revisto, busca em corpora.Unisinos2021-05-27info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/5256Calidoscópio; Vol. 6 No. 3 (2008): September/December; 142-148Calidoscópio; v. 6 n. 3 (2008): Setembro/Dezembro; 142-1482177-6202reponame:Calidoscópio (Online)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:Unisinosporhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/5256/2510Copyright (c) 2021 Calidoscópioinfo:eu-repo/semantics/openAccessFreitas, ClaudiaRocha, PauloBick, Eckhard2021-05-27T19:17:40Zoai:ojs2.revistas.unisinos.br:article/5256Revistahttps://revistas.unisinos.br/index.php/calidoscopioPUBhttps://revistas.unisinos.br/index.php/calidoscopio/oaicmira@unisinos.br || cmira@unisinos.br2177-62022177-6202opendoar:2021-05-27T19:17:40Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)false |
dc.title.none.fl_str_mv |
A new world in Floresta Sintá(c)tica – the Portuguese treebank Um mundo novo na Floresta Sintá(c)tica – o treebank do Português |
title |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
spellingShingle |
A new world in Floresta Sintá(c)tica – the Portuguese treebank Freitas, Claudia |
title_short |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
title_full |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
title_fullStr |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
title_full_unstemmed |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
title_sort |
A new world in Floresta Sintá(c)tica – the Portuguese treebank |
author |
Freitas, Claudia |
author_facet |
Freitas, Claudia Rocha, Paulo Bick, Eckhard |
author_role |
author |
author2 |
Rocha, Paulo Bick, Eckhard |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Freitas, Claudia Rocha, Paulo Bick, Eckhard |
description |
Floresta Sintá(c)tica is a publicly available treebank for Portuguese, created as a collaboration project between Linguateca and the VISL project. It consists of Brazilian and European Portuguese texts automatically annotated by the parser PALAVRAS (Bick, 2000) and manually revised. In this paper, we present two new corpora, Selva (composed by literary, scientific and transcribed spoken texts, partially revised) and Amazonia, (a huge corpus of 3.8 million words, unrevised), and a user-friendly web based corpus tool, Milhafre. We also present how we manage to balance (a) our user, which can have different linguistic background, (b) the need for a grammar that is rich and complex enough in order to process real language (our corpora); and (c) the absence of a consensual syntactic model. Key words: Portuguese treebank, annotated corpus, revised corpus, user-friendly corpus tool. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-05-27 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256 |
url |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/5256/2510 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2021 Calidoscópio info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2021 Calidoscópio |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Unisinos |
publisher.none.fl_str_mv |
Unisinos |
dc.source.none.fl_str_mv |
Calidoscópio; Vol. 6 No. 3 (2008): September/December; 142-148 Calidoscópio; v. 6 n. 3 (2008): Setembro/Dezembro; 142-148 2177-6202 reponame:Calidoscópio (Online) instname:Universidade do Vale do Rio dos Sinos (UNISINOS) instacron:Unisinos |
instname_str |
Universidade do Vale do Rio dos Sinos (UNISINOS) |
instacron_str |
Unisinos |
institution |
Unisinos |
reponame_str |
Calidoscópio (Online) |
collection |
Calidoscópio (Online) |
repository.name.fl_str_mv |
Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS) |
repository.mail.fl_str_mv |
cmira@unisinos.br || cmira@unisinos.br |
_version_ |
1792203885708836864 |