Incremental unsupervised name disambiguation in cleaned digital libraries.

Detalhes bibliográficos
Autor(a) principal: Carvalho, Ana Paula de
Data de Publicação: 2011
Outros Autores: Ferreira, Anderson Almeida, Laender, Alberto Henrique Frade, Gonçalves, Marcos André
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFOP
Texto Completo: http://www.repositorio.ufop.br/handle/123456789/1730
Resumo: Name ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) .
id UFOP_36573f8c005ffc11f7e8353c88581c27
oai_identifier_str oai:repositorio.ufop.br:123456789/1730
network_acronym_str UFOP
network_name_str Repositório Institucional da UFOP
repository_id_str 3233
spelling Incremental unsupervised name disambiguation in cleaned digital libraries.Bibliographic citationDigital libraryName librariesName ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) .2012-10-22T19:11:20Z2012-10-22T19:11:20Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfCARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 201221666288http://www.repositorio.ufop.br/handle/123456789/1730Permission to copy without fee all or part of the material printed in JIDM is granted provided that the copies are not made or distributed for commercial advantage, and that notice is given that copying is by permission of the Sociedade Brasileira de Computação. Fonte: o próprio artigo.info:eu-repo/semantics/openAccessCarvalho, Ana Paula deFerreira, Anderson AlmeidaLaender, Alberto Henrique FradeGonçalves, Marcos Andréengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOP2019-03-13T14:57:59Zoai:repositorio.ufop.br:123456789/1730Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332019-03-13T14:57:59Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false
dc.title.none.fl_str_mv Incremental unsupervised name disambiguation in cleaned digital libraries.
title Incremental unsupervised name disambiguation in cleaned digital libraries.
spellingShingle Incremental unsupervised name disambiguation in cleaned digital libraries.
Carvalho, Ana Paula de
Bibliographic citation
Digital library
Name libraries
title_short Incremental unsupervised name disambiguation in cleaned digital libraries.
title_full Incremental unsupervised name disambiguation in cleaned digital libraries.
title_fullStr Incremental unsupervised name disambiguation in cleaned digital libraries.
title_full_unstemmed Incremental unsupervised name disambiguation in cleaned digital libraries.
title_sort Incremental unsupervised name disambiguation in cleaned digital libraries.
author Carvalho, Ana Paula de
author_facet Carvalho, Ana Paula de
Ferreira, Anderson Almeida
Laender, Alberto Henrique Frade
Gonçalves, Marcos André
author_role author
author2 Ferreira, Anderson Almeida
Laender, Alberto Henrique Frade
Gonçalves, Marcos André
author2_role author
author
author
dc.contributor.author.fl_str_mv Carvalho, Ana Paula de
Ferreira, Anderson Almeida
Laender, Alberto Henrique Frade
Gonçalves, Marcos André
dc.subject.por.fl_str_mv Bibliographic citation
Digital library
Name libraries
topic Bibliographic citation
Digital library
Name libraries
description Name ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) .
publishDate 2011
dc.date.none.fl_str_mv 2011
2012-10-22T19:11:20Z
2012-10-22T19:11:20Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv CARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 2012
21666288
http://www.repositorio.ufop.br/handle/123456789/1730
identifier_str_mv CARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 2012
21666288
url http://www.repositorio.ufop.br/handle/123456789/1730
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFOP
instname:Universidade Federal de Ouro Preto (UFOP)
instacron:UFOP
instname_str Universidade Federal de Ouro Preto (UFOP)
instacron_str UFOP
institution UFOP
reponame_str Repositório Institucional da UFOP
collection Repositório Institucional da UFOP
repository.name.fl_str_mv Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)
repository.mail.fl_str_mv repositorio@ufop.edu.br
_version_ 1813002830667055104