Incremental unsupervised name disambiguation in cleaned digital libraries.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFOP |
Texto Completo: | http://www.repositorio.ufop.br/handle/123456789/1730 |
Resumo: | Name ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) . |
id |
UFOP_36573f8c005ffc11f7e8353c88581c27 |
---|---|
oai_identifier_str |
oai:repositorio.ufop.br:123456789/1730 |
network_acronym_str |
UFOP |
network_name_str |
Repositório Institucional da UFOP |
repository_id_str |
3233 |
spelling |
Incremental unsupervised name disambiguation in cleaned digital libraries.Bibliographic citationDigital libraryName librariesName ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) .2012-10-22T19:11:20Z2012-10-22T19:11:20Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfCARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 201221666288http://www.repositorio.ufop.br/handle/123456789/1730Permission to copy without fee all or part of the material printed in JIDM is granted provided that the copies are not made or distributed for commercial advantage, and that notice is given that copying is by permission of the Sociedade Brasileira de Computação. Fonte: o próprio artigo.info:eu-repo/semantics/openAccessCarvalho, Ana Paula deFerreira, Anderson AlmeidaLaender, Alberto Henrique FradeGonçalves, Marcos Andréengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOP2019-03-13T14:57:59Zoai:repositorio.ufop.br:123456789/1730Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332019-03-13T14:57:59Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false |
dc.title.none.fl_str_mv |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
title |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
spellingShingle |
Incremental unsupervised name disambiguation in cleaned digital libraries. Carvalho, Ana Paula de Bibliographic citation Digital library Name libraries |
title_short |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
title_full |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
title_fullStr |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
title_full_unstemmed |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
title_sort |
Incremental unsupervised name disambiguation in cleaned digital libraries. |
author |
Carvalho, Ana Paula de |
author_facet |
Carvalho, Ana Paula de Ferreira, Anderson Almeida Laender, Alberto Henrique Frade Gonçalves, Marcos André |
author_role |
author |
author2 |
Ferreira, Anderson Almeida Laender, Alberto Henrique Frade Gonçalves, Marcos André |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Carvalho, Ana Paula de Ferreira, Anderson Almeida Laender, Alberto Henrique Frade Gonçalves, Marcos André |
dc.subject.por.fl_str_mv |
Bibliographic citation Digital library Name libraries |
topic |
Bibliographic citation Digital library Name libraries |
description |
Name ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) . |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011 2012-10-22T19:11:20Z 2012-10-22T19:11:20Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
CARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 2012 21666288 http://www.repositorio.ufop.br/handle/123456789/1730 |
identifier_str_mv |
CARVALHO, A. P. de et al. Incremental un supervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, v. 2, n. 3, p. 289-304, 2011. Disponível em: <http://seer.lcc.ufmg.br/index.php/jidm/article/viewFile/151/88>. Acesso em: 22 out. 2012 21666288 |
url |
http://www.repositorio.ufop.br/handle/123456789/1730 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFOP instname:Universidade Federal de Ouro Preto (UFOP) instacron:UFOP |
instname_str |
Universidade Federal de Ouro Preto (UFOP) |
instacron_str |
UFOP |
institution |
UFOP |
reponame_str |
Repositório Institucional da UFOP |
collection |
Repositório Institucional da UFOP |
repository.name.fl_str_mv |
Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP) |
repository.mail.fl_str_mv |
repositorio@ufop.edu.br |
_version_ |
1813002830667055104 |