Portuguese corpus-based learning using ETL

Detalhes bibliográficos
Autor(a) principal: Milidiú,Ruy Luiz
Data de Publicação: 2008
Outros Autores: Santos,Cícero Nogueira dos, Duarte,Julio Cesar
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Journal of the Brazilian Computer Society
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003
Resumo: We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.
id UFRGS-28_3bfe12bbba44abac5fd0d76677ff3c2b
oai_identifier_str oai:scielo:S0104-65002008000400003
network_acronym_str UFRGS-28
network_name_str Journal of the Brazilian Computer Society
repository_id_str
spelling Portuguese corpus-based learning using ETLEntropy Guided Transformation Learningtransformation-based learningdecision treesnatural language processingWe present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.Sociedade Brasileira de Computação2008-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003Journal of the Brazilian Computer Society v.14 n.4 2008reponame:Journal of the Brazilian Computer Societyinstname:Sociedade Brasileira de Computação (SBC)instacron:UFRGS10.1007/BF03192569info:eu-repo/semantics/openAccessMilidiú,Ruy LuizSantos,Cícero Nogueira dosDuarte,Julio Cesareng2009-03-09T00:00:00Zoai:scielo:S0104-65002008000400003Revistahttps://journal-bcs.springeropen.com/PUBhttps://old.scielo.br/oai/scielo-oai.phpjbcs@icmc.sc.usp.br1678-48040104-6500opendoar:2009-03-09T00:00Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC)false
dc.title.none.fl_str_mv Portuguese corpus-based learning using ETL
title Portuguese corpus-based learning using ETL
spellingShingle Portuguese corpus-based learning using ETL
Milidiú,Ruy Luiz
Entropy Guided Transformation Learning
transformation-based learning
decision trees
natural language processing
title_short Portuguese corpus-based learning using ETL
title_full Portuguese corpus-based learning using ETL
title_fullStr Portuguese corpus-based learning using ETL
title_full_unstemmed Portuguese corpus-based learning using ETL
title_sort Portuguese corpus-based learning using ETL
author Milidiú,Ruy Luiz
author_facet Milidiú,Ruy Luiz
Santos,Cícero Nogueira dos
Duarte,Julio Cesar
author_role author
author2 Santos,Cícero Nogueira dos
Duarte,Julio Cesar
author2_role author
author
dc.contributor.author.fl_str_mv Milidiú,Ruy Luiz
Santos,Cícero Nogueira dos
Duarte,Julio Cesar
dc.subject.por.fl_str_mv Entropy Guided Transformation Learning
transformation-based learning
decision trees
natural language processing
topic Entropy Guided Transformation Learning
transformation-based learning
decision trees
natural language processing
description We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06. For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.
publishDate 2008
dc.date.none.fl_str_mv 2008-12-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002008000400003
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/BF03192569
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Sociedade Brasileira de Computação
publisher.none.fl_str_mv Sociedade Brasileira de Computação
dc.source.none.fl_str_mv Journal of the Brazilian Computer Society v.14 n.4 2008
reponame:Journal of the Brazilian Computer Society
instname:Sociedade Brasileira de Computação (SBC)
instacron:UFRGS
instname_str Sociedade Brasileira de Computação (SBC)
instacron_str UFRGS
institution UFRGS
reponame_str Journal of the Brazilian Computer Society
collection Journal of the Brazilian Computer Society
repository.name.fl_str_mv Journal of the Brazilian Computer Society - Sociedade Brasileira de Computação (SBC)
repository.mail.fl_str_mv jbcs@icmc.sc.usp.br
_version_ 1754734669981548544