Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization

Detalhes bibliográficos
Autor(a) principal: Silva, João
Data de Publicação: 2007
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/14016
Resumo: This dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Part-of-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information Ñ such as POS categories or lemmas Ñ that is important to most subsequent processing (e.g. syntactic and semantic analysis). I follow a shallow processing approach, where linguistic information is associated to text based on local information (i.e. using the word itself or perhaps a limited window of context containing just a few words). I begin by identifying and describing the key problems raised by each task, with special focus on the problems that are speci?c to Portuguese. After an overview of existing approaches and tools, I describe the solutions I followed to the issues raised previously. I then report on my implementation of these solutions, which are found either to yield state-of-the-art performance or, in some cases, to advance the state-of-the-art. The major result of this dissertation is thus threefold: A description of the problems found in NLP of Portuguese, a set of algorithms and the corresponding tools to tackle those problems, together with their evaluation results
id RCAP_1090fb7b19def61873eb32343344626a
oai_identifier_str oai:repositorio.ul.pt:10451/14016
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Shallow Processing of Portuguese: From Sentence Chunking to Nominal LemmatizationNatural language processingShallow processingSentence segmentation, TokenizationMorphosyntatcic annotationMorphological analysisLemmatizatiThis dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Part-of-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information Ñ such as POS categories or lemmas Ñ that is important to most subsequent processing (e.g. syntactic and semantic analysis). I follow a shallow processing approach, where linguistic information is associated to text based on local information (i.e. using the word itself or perhaps a limited window of context containing just a few words). I begin by identifying and describing the key problems raised by each task, with special focus on the problems that are speci?c to Portuguese. After an overview of existing approaches and tools, I describe the solutions I followed to the issues raised previously. I then report on my implementation of these solutions, which are found either to yield state-of-the-art performance or, in some cases, to advance the state-of-the-art. The major result of this dissertation is thus threefold: A description of the problems found in NLP of Portuguese, a set of algorithms and the corresponding tools to tackle those problems, together with their evaluation resultsDepartment of Informatics, University of LisbonBranco, AntónioRepositório da Universidade de LisboaSilva, João2009-02-10T13:12:58Z2007-062007-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/14016porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:33Zoai:repositorio.ul.pt:10451/14016Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:35:54.337516Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
title Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
spellingShingle Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
Silva, João
Natural language processing
Shallow processing
Sentence segmentation, Tokenization
Morphosyntatcic annotation
Morphological analysis
Lemmatizati
title_short Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
title_full Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
title_fullStr Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
title_full_unstemmed Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
title_sort Shallow Processing of Portuguese: From Sentence Chunking to Nominal Lemmatization
author Silva, João
author_facet Silva, João
author_role author
dc.contributor.none.fl_str_mv Branco, António
Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Silva, João
dc.subject.por.fl_str_mv Natural language processing
Shallow processing
Sentence segmentation, Tokenization
Morphosyntatcic annotation
Morphological analysis
Lemmatizati
topic Natural language processing
Shallow processing
Sentence segmentation, Tokenization
Morphosyntatcic annotation
Morphological analysis
Lemmatizati
description This dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Part-of-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information Ñ such as POS categories or lemmas Ñ that is important to most subsequent processing (e.g. syntactic and semantic analysis). I follow a shallow processing approach, where linguistic information is associated to text based on local information (i.e. using the word itself or perhaps a limited window of context containing just a few words). I begin by identifying and describing the key problems raised by each task, with special focus on the problems that are speci?c to Portuguese. After an overview of existing approaches and tools, I describe the solutions I followed to the issues raised previously. I then report on my implementation of these solutions, which are found either to yield state-of-the-art performance or, in some cases, to advance the state-of-the-art. The major result of this dissertation is thus threefold: A description of the problems found in NLP of Portuguese, a set of algorithms and the corresponding tools to tackle those problems, together with their evaluation results
publishDate 2007
dc.date.none.fl_str_mv 2007-06
2007-06-01T00:00:00Z
2009-02-10T13:12:58Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/14016
url http://hdl.handle.net/10451/14016
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Department of Informatics, University of Lisbon
publisher.none.fl_str_mv Department of Informatics, University of Lisbon
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134258037522432