Relation Extraction using Different Features in Portuguese

Souza, Erick Nilsen Pereira; Claro, Daniela Barreiro

Relation Extraction using Different Features in Portuguese

Detalhes bibliográficos
Autor(a) principal:	Souza, Erick Nilsen Pereira
Data de Publicação:	2014
Outros Autores:	Claro, Daniela Barreiro
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4
Resumo:	Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).

Metadados do item

id	RCAP_a995a9f80b25437a8a66604e035bc0ae
oai_identifier_str	oai:linguamatica.com:article/182
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Relation Extraction using Different Features in PortugueseExtração de Relações utilizando Features Diferenciadas para PortuguêsExtração de Relações utilizando Features Diferenciadas para PortuguêsExtração de Relações AbertasSeleção de CaracterísticasRelation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).A Extração de Relações (ER) é uma tarefa da Extração da Informação responsável pela descoberta de relacionamentos semânticos entre conceitos em textos não estruturados. Quando a extração não é limitada por um conjunto predefinido de relações, a ER é dita Aberta, cujo principal desafio consiste em reduzir a proporção de extrações inválidas no universo de relações identificadas. As soluções atuais, baseadas em aprendizado sobre um conjunto de features linguísticas específicas, embora consigam eliminar grande parte das extrações inválidas, possuem como desvantagem a alta dependência do idioma. Tal dependência decorre da dificuldade inerente à determinação do conjunto de features mais representativo para o problema, considerando as peculiaridades de cada língua. Neste sentido, o presente trabalho propõe avaliar as dificuldades da classificação baseada em features na extração de relações semânticas abertas em Português, com o objetivo de embasar novas soluções capazes de reduzir a dependência do idioma nesta tarefa. Os resultados obtidos indicam que nem todas as features representativas em Inglês podem ser mapeadas diretamente para a Língua Portuguesa com méritos de classificação satisfatórios. Dentre os algoritmos de classificação avaliados, o J48 apresentou os melhores resultados com uma medida-F de 84,1%, seguido pelo SVM (83,9%), Perceptron (82,0%) e Naive Bayes (79,9%).A Extração de Relações (ER) é uma tarefa da Extração da Informação responsável pela descoberta de relacionamentos semânticos entre conceitos em textos não estruturados. Quando a extração não é limitada por um conjunto predefinido de relações, a ER é dita Aberta, cujo principal desafio consiste em reduzir a proporção de extrações inválidas no universo de relações identificadas. As soluções atuais, baseadas em aprendizado sobre um conjunto de features linguísticas específicas, embora consigam eliminar grande parte das extrações inválidas, possuem como desvantagem a alta dependência do idioma. Tal dependência decorre da dificuldade inerente à determinação do conjunto de features mais representativo para o problema, considerando as peculiaridades de cada língua. Neste sentido, o presente trabalho propõe avaliar as dificuldades da classificação baseada em features na extração de relações semânticas abertas em Português, com o objetivo de embasar novas soluções capazes de reduzir a dependência do idioma nesta tarefa. Os resultados obtidos indicam que nem todas as features representativas em Inglês podem ser mapeadas diretamente para a Língua Portuguesa com méritos de classificação satisfatórios. Dentre os algoritmos de classificação avaliados, o J48 apresentou os melhores resultados com uma medida-F de 84,1%, seguido pelo SVM (83,9%), Perceptron (82,0%) e Naive Bayes (79,9%).Universidade do Minho e Universidade de Vigo2014-12-26info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://linguamatica.com/index.php/linguamatica/article/view/v6n2-4Linguamática; Vol. 6 No. 2; 57-65Linguamática; Vol. 6 Núm. 2; 57-65Linguamática; v. 6 n. 2; 57-651647-0818reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://linguamatica.com/index.php/linguamatica/article/view/v6n2-4https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4/296Souza, Erick Nilsen PereiraClaro, Daniela Barreiroinfo:eu-repo/semantics/openAccess2023-09-08T13:46:31Zoai:linguamatica.com:article/182Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:28:36.601968Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Relation Extraction using Different Features in Portuguese Extração de Relações utilizando Features Diferenciadas para Português Extração de Relações utilizando Features Diferenciadas para Português
title	Relation Extraction using Different Features in Portuguese
spellingShingle	Relation Extraction using Different Features in Portuguese Souza, Erick Nilsen Pereira Extração de Relações Abertas Seleção de Características
title_short	Relation Extraction using Different Features in Portuguese
title_full	Relation Extraction using Different Features in Portuguese
title_fullStr	Relation Extraction using Different Features in Portuguese
title_full_unstemmed	Relation Extraction using Different Features in Portuguese
title_sort	Relation Extraction using Different Features in Portuguese
author	Souza, Erick Nilsen Pereira
author_facet	Souza, Erick Nilsen Pereira Claro, Daniela Barreiro
author_role	author
author2	Claro, Daniela Barreiro
author2_role	author
dc.contributor.author.fl_str_mv	Souza, Erick Nilsen Pereira Claro, Daniela Barreiro
dc.subject.por.fl_str_mv	Extração de Relações Abertas Seleção de Características
topic	Extração de Relações Abertas Seleção de Características
description	Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).
publishDate	2014
dc.date.none.fl_str_mv	2014-12-26
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4
url	https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4 https://linguamatica.com/index.php/linguamatica/article/view/v6n2-4/296
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade do Minho e Universidade de Vigo
publisher.none.fl_str_mv	Universidade do Minho e Universidade de Vigo
dc.source.none.fl_str_mv	Linguamática; Vol. 6 No. 2; 57-65 Linguamática; Vol. 6 Núm. 2; 57-65 Linguamática; v. 6 n. 2; 57-65 1647-0818 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799133553242406912

Relation Extraction using Different Features in Portuguese

Registros relacionados