Event extraction and representation: A case study for the portuguese language

Detalhes bibliográficos
Autor(a) principal: Quaresma, Paulo
Data de Publicação: 2019
Outros Autores: Nogueira, Vítor, Raiyani, Kashyap, Bayot, Roy
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/27059
https://doi.org/10.3390/info10060205
Resumo: Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.
id RCAP_0b3362d0f66365087ff32955dc33a5f8
oai_identifier_str oai:dspace.uevora.pt:10174/27059
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Event extraction and representation: A case study for the portuguese languageEventsInformation extractionNatural language processingOntologies populationText miningText information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.MDPI AG2020-02-19T11:57:14Z2020-02-192019-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/27059http://hdl.handle.net/10174/27059https://doi.org/10.3390/info10060205engpq@uevora.ptvbn@uevora.ptndnd283Quaresma, PauloNogueira, VítorRaiyani, KashyapBayot, Royinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:22:20Zoai:dspace.uevora.pt:10174/27059Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:17:14.159186Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Event extraction and representation: A case study for the portuguese language
title Event extraction and representation: A case study for the portuguese language
spellingShingle Event extraction and representation: A case study for the portuguese language
Quaresma, Paulo
Events
Information extraction
Natural language processing
Ontologies population
Text mining
title_short Event extraction and representation: A case study for the portuguese language
title_full Event extraction and representation: A case study for the portuguese language
title_fullStr Event extraction and representation: A case study for the portuguese language
title_full_unstemmed Event extraction and representation: A case study for the portuguese language
title_sort Event extraction and representation: A case study for the portuguese language
author Quaresma, Paulo
author_facet Quaresma, Paulo
Nogueira, Vítor
Raiyani, Kashyap
Bayot, Roy
author_role author
author2 Nogueira, Vítor
Raiyani, Kashyap
Bayot, Roy
author2_role author
author
author
dc.contributor.author.fl_str_mv Quaresma, Paulo
Nogueira, Vítor
Raiyani, Kashyap
Bayot, Roy
dc.subject.por.fl_str_mv Events
Information extraction
Natural language processing
Ontologies population
Text mining
topic Events
Information extraction
Natural language processing
Ontologies population
Text mining
description Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.
publishDate 2019
dc.date.none.fl_str_mv 2019-06-01T00:00:00Z
2020-02-19T11:57:14Z
2020-02-19
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/27059
http://hdl.handle.net/10174/27059
https://doi.org/10.3390/info10060205
url http://hdl.handle.net/10174/27059
https://doi.org/10.3390/info10060205
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv pq@uevora.pt
vbn@uevora.pt
nd
nd
283
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv MDPI AG
publisher.none.fl_str_mv MDPI AG
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136654579990528