The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.

Detalhes bibliográficos
Autor(a) principal: VARGAS, R. N. P.
Data de Publicação: 2012
Outros Autores: MOURA, M. F., SPERANZA, E. A., RODRIGUEZ, E., REZENDE, S. O.
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/948462
Resumo: Abstract. Nowadays it is becoming more usual for users to take into account the geographical localization of the documents in the retrieval information process. However, the conventional retrieval information systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the document. This paper presents the SpatialCIM methodology, which is based on three steps: pre-processing, data expansion and disambiguation. In the pre-processing step, the entity recognition process is carried out with the support of the Rembrandt tool. Additionally, a comparison between the performances regarding the discovery of the location entities in the texts of the Rembrandt tool against the use of a controlled vocabulary corresponding to the Brazilian geographic locations are presented. For the comparison a set of geographic labeled news covering the sugar cane culture in the Portuguese language is used. The results showed a F-measure value increase for the Rembrandt tool from 45% in the non-disambiguated process to 0.50 after disambiguation and from 35% to 38% using the controlled vocabulary. Additionally, the results showed the Rembrandt tool has a minimal amplitude difference between precision and recall, although the controlled vocabulary has always the biggest recall values.
id EMBR_99fad0cb563ff045cedb3c70ddb4e363
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/948462
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.Problema de ambiguidadeMetodologia SpatialCIMAmbiguity ProblemNamed Entity Recognition and ClassificationToponym resolutionAbstract. Nowadays it is becoming more usual for users to take into account the geographical localization of the documents in the retrieval information process. However, the conventional retrieval information systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the document. This paper presents the SpatialCIM methodology, which is based on three steps: pre-processing, data expansion and disambiguation. In the pre-processing step, the entity recognition process is carried out with the support of the Rembrandt tool. Additionally, a comparison between the performances regarding the discovery of the location entities in the texts of the Rembrandt tool against the use of a controlled vocabulary corresponding to the Brazilian geographic locations are presented. For the comparison a set of geographic labeled news covering the sugar cane culture in the Portuguese language is used. The results showed a F-measure value increase for the Rembrandt tool from 45% in the non-disambiguated process to 0.50 after disambiguation and from 35% to 38% using the controlled vocabulary. Additionally, the results showed the Rembrandt tool has a minimal amplitude difference between precision and recall, although the controlled vocabulary has always the biggest recall values.GeoDoc 2012, PAKDD 2012.ROSA NATHALIE PORTUGAL VARGAS, ICMC/USP; MARIA FERNANDA MOURA, CNPTIA; EDUARDO ANTONIO SPERANZA, CNPTIA; ERCILIA RODRIGUEZ; SOLANGE OLIVEIRA REZENDE, ICMC/USP.VARGAS, R. N. P.MOURA, M. F.SPERANZA, E. A.RODRIGUEZ, E.REZENDE, S. O.2013-02-06T23:03:12Z2013-02-06T23:03:12Z2013-02-0620122020-01-22T11:11:11ZArtigo em anais e proceedingsinfo:eu-repo/semantics/publishedVersionNão paginado.In: GEOSPATIAL INFORMATION AND DOCUMENTS; PACIFIC-ASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 16., 2012, Kuala Lumpur. Workshop... [S.l.: s.n.], 2012.http://www.alice.cnptia.embrapa.br/alice/handle/doc/948462enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2017-08-15T23:40:32Zoai:www.alice.cnptia.embrapa.br:doc/948462Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542017-08-15T23:40:32Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
title The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
spellingShingle The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
VARGAS, R. N. P.
Problema de ambiguidade
Metodologia SpatialCIM
Ambiguity Problem
Named Entity Recognition and Classification
Toponym resolution
title_short The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
title_full The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
title_fullStr The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
title_full_unstemmed The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
title_sort The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.
author VARGAS, R. N. P.
author_facet VARGAS, R. N. P.
MOURA, M. F.
SPERANZA, E. A.
RODRIGUEZ, E.
REZENDE, S. O.
author_role author
author2 MOURA, M. F.
SPERANZA, E. A.
RODRIGUEZ, E.
REZENDE, S. O.
author2_role author
author
author
author
dc.contributor.none.fl_str_mv ROSA NATHALIE PORTUGAL VARGAS, ICMC/USP; MARIA FERNANDA MOURA, CNPTIA; EDUARDO ANTONIO SPERANZA, CNPTIA; ERCILIA RODRIGUEZ; SOLANGE OLIVEIRA REZENDE, ICMC/USP.
dc.contributor.author.fl_str_mv VARGAS, R. N. P.
MOURA, M. F.
SPERANZA, E. A.
RODRIGUEZ, E.
REZENDE, S. O.
dc.subject.por.fl_str_mv Problema de ambiguidade
Metodologia SpatialCIM
Ambiguity Problem
Named Entity Recognition and Classification
Toponym resolution
topic Problema de ambiguidade
Metodologia SpatialCIM
Ambiguity Problem
Named Entity Recognition and Classification
Toponym resolution
description Abstract. Nowadays it is becoming more usual for users to take into account the geographical localization of the documents in the retrieval information process. However, the conventional retrieval information systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the document. This paper presents the SpatialCIM methodology, which is based on three steps: pre-processing, data expansion and disambiguation. In the pre-processing step, the entity recognition process is carried out with the support of the Rembrandt tool. Additionally, a comparison between the performances regarding the discovery of the location entities in the texts of the Rembrandt tool against the use of a controlled vocabulary corresponding to the Brazilian geographic locations are presented. For the comparison a set of geographic labeled news covering the sugar cane culture in the Portuguese language is used. The results showed a F-measure value increase for the Rembrandt tool from 45% in the non-disambiguated process to 0.50 after disambiguation and from 35% to 38% using the controlled vocabulary. Additionally, the results showed the Rembrandt tool has a minimal amplitude difference between precision and recall, although the controlled vocabulary has always the biggest recall values.
publishDate 2012
dc.date.none.fl_str_mv 2012
2013-02-06T23:03:12Z
2013-02-06T23:03:12Z
2013-02-06
2020-01-22T11:11:11Z
dc.type.driver.fl_str_mv Artigo em anais e proceedings
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv In: GEOSPATIAL INFORMATION AND DOCUMENTS; PACIFIC-ASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 16., 2012, Kuala Lumpur. Workshop... [S.l.: s.n.], 2012.
http://www.alice.cnptia.embrapa.br/alice/handle/doc/948462
identifier_str_mv In: GEOSPATIAL INFORMATION AND DOCUMENTS; PACIFIC-ASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 16., 2012, Kuala Lumpur. Workshop... [S.l.: s.n.], 2012.
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/948462
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv Não paginado.
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1817695278213365760