Event extraction and representation: A case study for the portuguese language
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/27059 https://doi.org/10.3390/info10060205 |
Resumo: | Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail. |
id |
RCAP_0b3362d0f66365087ff32955dc33a5f8 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/27059 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Event extraction and representation: A case study for the portuguese languageEventsInformation extractionNatural language processingOntologies populationText miningText information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.MDPI AG2020-02-19T11:57:14Z2020-02-192019-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/27059http://hdl.handle.net/10174/27059https://doi.org/10.3390/info10060205engpq@uevora.ptvbn@uevora.ptndnd283Quaresma, PauloNogueira, VítorRaiyani, KashyapBayot, Royinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:22:20Zoai:dspace.uevora.pt:10174/27059Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:17:14.159186Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Event extraction and representation: A case study for the portuguese language |
title |
Event extraction and representation: A case study for the portuguese language |
spellingShingle |
Event extraction and representation: A case study for the portuguese language Quaresma, Paulo Events Information extraction Natural language processing Ontologies population Text mining |
title_short |
Event extraction and representation: A case study for the portuguese language |
title_full |
Event extraction and representation: A case study for the portuguese language |
title_fullStr |
Event extraction and representation: A case study for the portuguese language |
title_full_unstemmed |
Event extraction and representation: A case study for the portuguese language |
title_sort |
Event extraction and representation: A case study for the portuguese language |
author |
Quaresma, Paulo |
author_facet |
Quaresma, Paulo Nogueira, Vítor Raiyani, Kashyap Bayot, Roy |
author_role |
author |
author2 |
Nogueira, Vítor Raiyani, Kashyap Bayot, Roy |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Quaresma, Paulo Nogueira, Vítor Raiyani, Kashyap Bayot, Roy |
dc.subject.por.fl_str_mv |
Events Information extraction Natural language processing Ontologies population Text mining |
topic |
Events Information extraction Natural language processing Ontologies population Text mining |
description |
Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-06-01T00:00:00Z 2020-02-19T11:57:14Z 2020-02-19 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/27059 http://hdl.handle.net/10174/27059 https://doi.org/10.3390/info10060205 |
url |
http://hdl.handle.net/10174/27059 https://doi.org/10.3390/info10060205 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
pq@uevora.pt vbn@uevora.pt nd nd 283 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
MDPI AG |
publisher.none.fl_str_mv |
MDPI AG |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136654579990528 |