Portuguese corpus-based learning using ETL
Autor(a) principal: | |
---|---|
Data de Publicação: | 2008 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Journal of the Brazilian Computer Society |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003 |
Resumo: | We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems. |
id |
UFRGS-28_3bfe12bbba44abac5fd0d76677ff3c2b |
---|---|
oai_identifier_str |
oai:scielo:S0104-65002008000400003 |
network_acronym_str |
UFRGS-28 |
network_name_str |
Journal of the Brazilian Computer Society |
repository_id_str |
|
spelling |
Portuguese corpus-based learning using ETLEntropy Guided Transformation Learningtransformation-based learningdecision treesnatural language processingWe present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.Sociedade Brasileira de Computação2008-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003Journal of the Brazilian Computer Society v.14 n.4 2008reponame:Journal of the Brazilian Computer Societyinstname:Sociedade Brasileira de Computação (SBC)instacron:UFRGS10.1007/BF03192569info:eu-repo/semantics/openAccessMilidiú,Ruy LuizSantos,Cícero Nogueira dosDuarte,Julio Cesareng2009-03-09T00:00:00Zoai:scielo:S0104-65002008000400003Revistahttps://journal-bcs.springeropen.com/PUBhttps://old.scielo.br/oai/scielo-oai.phpjbcs@icmc.sc.usp.br1678-48040104-6500opendoar:2009-03-09T00:00Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC)false |
dc.title.none.fl_str_mv |
Portuguese corpus-based learning using ETL |
title |
Portuguese corpus-based learning using ETL |
spellingShingle |
Portuguese corpus-based learning using ETL Milidiú,Ruy Luiz Entropy Guided Transformation Learning transformation-based learning decision trees natural language processing |
title_short |
Portuguese corpus-based learning using ETL |
title_full |
Portuguese corpus-based learning using ETL |
title_fullStr |
Portuguese corpus-based learning using ETL |
title_full_unstemmed |
Portuguese corpus-based learning using ETL |
title_sort |
Portuguese corpus-based learning using ETL |
author |
Milidiú,Ruy Luiz |
author_facet |
Milidiú,Ruy Luiz Santos,Cícero Nogueira dos Duarte,Julio Cesar |
author_role |
author |
author2 |
Santos,Cícero Nogueira dos Duarte,Julio Cesar |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Milidiú,Ruy Luiz Santos,Cícero Nogueira dos Duarte,Julio Cesar |
dc.subject.por.fl_str_mv |
Entropy Guided Transformation Learning transformation-based learning decision trees natural language processing |
topic |
Entropy Guided Transformation Learning transformation-based learning decision trees natural language processing |
description |
We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems. |
publishDate |
2008 |
dc.date.none.fl_str_mv |
2008-12-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1007/BF03192569 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Sociedade Brasileira de Computação |
publisher.none.fl_str_mv |
Sociedade Brasileira de Computação |
dc.source.none.fl_str_mv |
Journal of the Brazilian Computer Society v.14 n.4 2008 reponame:Journal of the Brazilian Computer Society instname:Sociedade Brasileira de Computação (SBC) instacron:UFRGS |
instname_str |
Sociedade Brasileira de Computação (SBC) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Journal of the Brazilian Computer Society |
collection |
Journal of the Brazilian Computer Society |
repository.name.fl_str_mv |
Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC) |
repository.mail.fl_str_mv |
jbcs@icmc.sc.usp.br |
_version_ |
1754734669981548544 |