Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | por eng |
Título da fonte: | Alfa (São José do Rio Preto. Online) |
Texto Completo: | https://periodicos.fclar.unesp.br/alfa/article/view/11234 |
Resumo: | We report an experiment of checking the identification of a set of words in popular Portuguese written text with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This computational dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other research, which gives reasons for undertaking a critical study. The set of words comes from the PorPopular corpus, composed of popular newspapers, the Diário Gaúcho (DG) and the Bahian newspaper Massa! (MA). From DG, we studied a set of texts with 984,465 words (tokens), published in 2008, in the spelling used before the Orthographic Agreement of the Portuguese Language adopted in 2009. From MA, we examined a vocabulary of 215,776 words (tokens), from papers published in 2012, 2014 and 2015 in the new spelling. The verification involved: a) generating lists of unique words used in DG and MA; b) comparing these lists with the entry lists of the two versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of including the items not covered. The results showed that an average of 19% of the types in the DG corpus were unknown by the DELAF PB 2004 and 2015. In the MA sample, this average was 13%. The version of the dictionary impacted slightly on item recognition performance. |
id |
UNESP-4_21e284d52de5588c979815808d41d955 |
---|---|
oai_identifier_str |
oai:ojs.pkp.sfu.ca:article/11234 |
network_acronym_str |
UNESP-4 |
network_name_str |
Alfa (São José do Rio Preto. Online) |
repository_id_str |
|
spelling |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionaryReconhecimento do vocabulário de jornais populares brasileiros por um dicionário computacional de acesso livrePopular newspapersLexicVocabularyComputational dictionaryLexical coverageRecognition of wordsBrazilian PortugueseJornais popularesLéxicoVocabulárioDicionário computacionalCobertura lexicalReconhecimento de palavrasPortuguês brasileiroWe report an experiment of checking the identification of a set of words in popular Portuguese written text with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This computational dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other research, which gives reasons for undertaking a critical study. The set of words comes from the PorPopular corpus, composed of popular newspapers, the Diário Gaúcho (DG) and the Bahian newspaper Massa! (MA). From DG, we studied a set of texts with 984,465 words (tokens), published in 2008, in the spelling used before the Orthographic Agreement of the Portuguese Language adopted in 2009. From MA, we examined a vocabulary of 215,776 words (tokens), from papers published in 2012, 2014 and 2015 in the new spelling. The verification involved: a) generating lists of unique words used in DG and MA; b) comparing these lists with the entry lists of the two versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of including the items not covered. The results showed that an average of 19% of the types in the DG corpus were unknown by the DELAF PB 2004 and 2015. In the MA sample, this average was 13%. The version of the dictionary impacted slightly on item recognition performance.Relata-se um experimento de verificação da identificação de um universo de palavras do português popular escrito por duas versões de um dicionário computacional do português brasileiro (PB), DELAF PB 2004 e DELAF PB 2015. Esse dicionário computacional é gratuitamente acessível para ser utilizado em análises linguísticas do Português do Brasil e em outras pesquisas, o que justifica um estudo crítico. O universo vocabular provém do corpus PorPopular, composto por jornais populares, o Diário Gaúcho (DG) e o jornal baiano Massa! (MA). Do DG, partiu-se de um conjunto de textos com 984.465 palavras (tokens), publicados em 2008, com ortografia desatualizada frente ao Acordo Ortográfico da Língua Portuguesa adotado em 2009. Do MA, examinou-se um universo com 215.776 palavras (tokens), em publicações de 2012, 2014 e 2015, com todo o material na nova ortografia. A verificação envolveu: a) gerar listas de palavras diferentes empregadas em DG e MA; b) comparar essas listas com as listas de entradas das duas versões do DELAF PB; c) avaliar a cobertura desse vocabulário; d) propor modos de inclusão de itens não cobertos. Os resultados do trabalho mostraram, no DG, uma média de 19% de palavras diferentes (types) desconhecidas pelos DELAF PB 2004 e 2015. No MA, essa média ficou em 13%. A versão do dicionário repercutiu ligeiramente sobre o desempenho do reconhecimento de itens.UNESP2019-04-15info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfapplication/pdfhttps://periodicos.fclar.unesp.br/alfa/article/view/1123410.1590/1981-5794-1904-3ALFA: Revista de Linguística; v. 63 n. 1 (2019)1981-5794reponame:Alfa (São José do Rio Preto. Online)instname:Universidade Estadual Paulista (UNESP)instacron:UNESPporenghttps://periodicos.fclar.unesp.br/alfa/article/view/11234/8182https://periodicos.fclar.unesp.br/alfa/article/view/11234/8178Copyright (c) 2019 ALFA: Revista de Linguísticainfo:eu-repo/semantics/openAccessFinatto, Maria José BocornyVale, Oto AraújoLaporte, Éric2019-04-15T19:45:02Zoai:ojs.pkp.sfu.ca:article/11234Revistahttp://www.scielo.br/scielo.php?script=sci_serial&pid=1981-5794&lng=pt&nrm=isoPUBhttps://old.scielo.br/oai/scielo-oai.phpalfa@unesp.br1981-57940002-5216opendoar:2019-04-15T19:45:02Alfa (São José do Rio Preto. Online) - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary Reconhecimento do vocabulário de jornais populares brasileiros por um dicionário computacional de acesso livre |
title |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
spellingShingle |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary Finatto, Maria José Bocorny Popular newspapers Lexic Vocabulary Computational dictionary Lexical coverage Recognition of words Brazilian Portuguese Jornais populares Léxico Vocabulário Dicionário computacional Cobertura lexical Reconhecimento de palavras Português brasileiro |
title_short |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
title_full |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
title_fullStr |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
title_full_unstemmed |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
title_sort |
Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary |
author |
Finatto, Maria José Bocorny |
author_facet |
Finatto, Maria José Bocorny Vale, Oto Araújo Laporte, Éric |
author_role |
author |
author2 |
Vale, Oto Araújo Laporte, Éric |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Finatto, Maria José Bocorny Vale, Oto Araújo Laporte, Éric |
dc.subject.por.fl_str_mv |
Popular newspapers Lexic Vocabulary Computational dictionary Lexical coverage Recognition of words Brazilian Portuguese Jornais populares Léxico Vocabulário Dicionário computacional Cobertura lexical Reconhecimento de palavras Português brasileiro |
topic |
Popular newspapers Lexic Vocabulary Computational dictionary Lexical coverage Recognition of words Brazilian Portuguese Jornais populares Léxico Vocabulário Dicionário computacional Cobertura lexical Reconhecimento de palavras Português brasileiro |
description |
We report an experiment of checking the identification of a set of words in popular Portuguese written text with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This computational dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other research, which gives reasons for undertaking a critical study. The set of words comes from the PorPopular corpus, composed of popular newspapers, the Diário Gaúcho (DG) and the Bahian newspaper Massa! (MA). From DG, we studied a set of texts with 984,465 words (tokens), published in 2008, in the spelling used before the Orthographic Agreement of the Portuguese Language adopted in 2009. From MA, we examined a vocabulary of 215,776 words (tokens), from papers published in 2012, 2014 and 2015 in the new spelling. The verification involved: a) generating lists of unique words used in DG and MA; b) comparing these lists with the entry lists of the two versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of including the items not covered. The results showed that an average of 19% of the types in the DG corpus were unknown by the DELAF PB 2004 and 2015. In the MA sample, this average was 13%. The version of the dictionary impacted slightly on item recognition performance. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-04-15 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://periodicos.fclar.unesp.br/alfa/article/view/11234 10.1590/1981-5794-1904-3 |
url |
https://periodicos.fclar.unesp.br/alfa/article/view/11234 |
identifier_str_mv |
10.1590/1981-5794-1904-3 |
dc.language.iso.fl_str_mv |
por eng |
language |
por eng |
dc.relation.none.fl_str_mv |
https://periodicos.fclar.unesp.br/alfa/article/view/11234/8182 https://periodicos.fclar.unesp.br/alfa/article/view/11234/8178 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2019 ALFA: Revista de Linguística info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2019 ALFA: Revista de Linguística |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
UNESP |
publisher.none.fl_str_mv |
UNESP |
dc.source.none.fl_str_mv |
ALFA: Revista de Linguística; v. 63 n. 1 (2019) 1981-5794 reponame:Alfa (São José do Rio Preto. Online) instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Alfa (São José do Rio Preto. Online) |
collection |
Alfa (São José do Rio Preto. Online) |
repository.name.fl_str_mv |
Alfa (São José do Rio Preto. Online) - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
alfa@unesp.br |
_version_ |
1800214377483206656 |