What is a corpus and how to build it? Lessons learned from developing several linguistic corpora
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Calidoscópio (Online) |
Texto Completo: | https://revistas.unisinos.br/index.php/calidoscopio/article/view/6002 |
Resumo: | The research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing. Key-words: corpus; corpus linguistics; corpus processing. |
id |
Unisinos-3_ef9dd103c41ecdd37e8f4fb6daa35044 |
---|---|
oai_identifier_str |
oai:ojs2.revistas.unisinos.br:article/6002 |
network_acronym_str |
Unisinos-3 |
network_name_str |
Calidoscópio (Online) |
repository_id_str |
|
spelling |
What is a corpus and how to build it? Lessons learned from developing several linguistic corporaO que é e como se constrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa linguísticaThe research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing. Key-words: corpus; corpus linguistics; corpus processing.As pesquisas baseadas em corpus têm tido na última década um amplo desenvolvimento no contexto brasileiro. Nota-se a sua relevância e pertinência nos domínios da Lingüística, da Lingüística Aplicada e da Lingüística Computacional. Em vista disso, uma abordagem surge para sistematizar procedimentos e dar conta desse novo modo de fazer pesquisa. Essa abordagem é a Lingüística de Corpus que, auxiliada pelo desenvolvimento de ferramentas computacionais específicas para o tratamento do português brasileiro, pode alcançar um grande desenvolvimento no Brasil. Entretanto, muito do que já se obteve de desenvolvimento em Lingüística de Corpus no cenário internacional não se reflete em muitas das pesquisas realizadas no Brasil, uma vez que as práticas mundialmente aceitas ainda não estão aqui sedimentadas, a despeito de haver no país eminentes pesquisadores que desenvolvem extraordinários projetos baseados em corpus. Assim, este artigo tem o propósito de discorrer sobre a concepção de corpus, os requisitos e procedimentos para a sua elaboração, os corpora e ferramentas existentes e disponíveis e, finalmente, apresentar quatro projetos envolvendo corpus cuja descrição e detalhamento pode auxiliar outros pesquisadores nessa tarefa. Palavras-chave: corpus; lingüística de corpus; processamento de corpus.Unisinos2021-05-27info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/6002Calidoscópio; Vol. 4 No. 3 (2006): September/December; 156-178Calidoscópio; v. 4 n. 3 (2006): Setembro/Dezembro; 156-1782177-6202reponame:Calidoscópio (Online)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:Unisinosporhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/6002/3178Copyright (c) 2021 Calidoscópioinfo:eu-repo/semantics/openAccessAluísio, Sandra MariaAlmeida, Gladis Maria de Barcellos2021-05-27T18:20:22Zoai:ojs2.revistas.unisinos.br:article/6002Revistahttps://revistas.unisinos.br/index.php/calidoscopioPUBhttps://revistas.unisinos.br/index.php/calidoscopio/oaicmira@unisinos.br || cmira@unisinos.br2177-62022177-6202opendoar:2021-05-27T18:20:22Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)false |
dc.title.none.fl_str_mv |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora O que é e como se constrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa linguística |
title |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
spellingShingle |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora Aluísio, Sandra Maria |
title_short |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
title_full |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
title_fullStr |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
title_full_unstemmed |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
title_sort |
What is a corpus and how to build it? Lessons learned from developing several linguistic corpora |
author |
Aluísio, Sandra Maria |
author_facet |
Aluísio, Sandra Maria Almeida, Gladis Maria de Barcellos |
author_role |
author |
author2 |
Almeida, Gladis Maria de Barcellos |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Aluísio, Sandra Maria Almeida, Gladis Maria de Barcellos |
description |
The research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing. Key-words: corpus; corpus linguistics; corpus processing. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-05-27 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/6002 |
url |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/6002 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://revistas.unisinos.br/index.php/calidoscopio/article/view/6002/3178 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2021 Calidoscópio info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2021 Calidoscópio |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Unisinos |
publisher.none.fl_str_mv |
Unisinos |
dc.source.none.fl_str_mv |
Calidoscópio; Vol. 4 No. 3 (2006): September/December; 156-178 Calidoscópio; v. 4 n. 3 (2006): Setembro/Dezembro; 156-178 2177-6202 reponame:Calidoscópio (Online) instname:Universidade do Vale do Rio dos Sinos (UNISINOS) instacron:Unisinos |
instname_str |
Universidade do Vale do Rio dos Sinos (UNISINOS) |
instacron_str |
Unisinos |
institution |
Unisinos |
reponame_str |
Calidoscópio (Online) |
collection |
Calidoscópio (Online) |
repository.name.fl_str_mv |
Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS) |
repository.mail.fl_str_mv |
cmira@unisinos.br || cmira@unisinos.br |
_version_ |
1792203885794820096 |