On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.

Detalhes bibliográficos
Autor(a) principal: Santana, Alan Filipe
Data de Publicação: 2015
Outros Autores: Gonçalves, André Gonçalves, Laender, Alberto Henrique Frade, Ferreira, Anderson Almeida
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Institucional da UFOP
Texto Completo: http://www.repositorio.ufop.br/handle/123456789/7140
https://link.springer.com/article/10.1007/s00799-015-0158-y
https://doi.org/10.1007/s00799-015-0158-y
Resumo: Author name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.
id UFOP_6e9fd3939139ee63282a5659e3144943
oai_identifier_str oai:localhost:123456789/7140
network_acronym_str UFOP
network_name_str Repositório Institucional da UFOP
repository_id_str 3233
spelling Santana, Alan FilipeGonçalves, André GonçalvesLaender, Alberto Henrique FradeFerreira, Anderson Almeida2017-01-20T14:18:03Z2017-01-20T14:18:03Z2015SANTANA, A. F. et al. On the combination of domain-specific heuristics for auhor name disambiguation : the nearest cluster method. International Journal on Digital Libraries, n. 16, p. 229-246, 2015. Disponível em: <https://link.springer.com/article/10.1007/s00799-015-0158-y>. Acesso em: 20 jan. 2017.1432-1300http://www.repositorio.ufop.br/handle/123456789/7140https://link.springer.com/article/10.1007/s00799-015-0158-yhttps://doi.org/10.1007/s00799-015-0158-yAuthor name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.Supervised methodsOn the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPLICENSElicense.txtlicense.txttext/plain; charset=utf-8924http://www.repositorio.ufop.br/bitstream/123456789/7140/2/license.txt62604f8d955274beb56c80ce1ee5dcaeMD52ORIGINALARTIGO_CombinationDomainSpecific.pdfARTIGO_CombinationDomainSpecific.pdfapplication/pdf668448http://www.repositorio.ufop.br/bitstream/123456789/7140/1/ARTIGO_CombinationDomainSpecific.pdfe526d08555bcd1cf7df08b42e61ca4eeMD51123456789/71402019-10-17 14:28:23.89oai:localhost:123456789/7140RGVjbGFyYcOnw6NvIGRlIGRpc3RyaWJ1acOnw6NvIG7Do28tZXhjbHVzaXZhCgpPIHJlZmVyaWRvIGF1dG9yOgoKYSlEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBlbnRyZWd1ZSDDqSBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2UsIHRhbnRvIHF1YW50byBsaGUgw6kgcG9zc8OtdmVsIHNhYmVyLCBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBwZXNzb2Egb3UgZW50aWRhZGUuCgpiKVNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIGNvbnTDqW0gbWF0ZXJpYWwgZG8gcXVhbCBuw6NvIGRldMOpbSBvcyBkaXJlaXRvcyBkZSBhdXRvciwgZGVjbGFyYSBxdWUgb2J0ZXZlIGF1dG9yaXphw6fDo28gZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGRlIGF1dG9yIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgT3VybyBQcmV0by9VRk9QIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgbGljZW7Dp2EgZSBxdWUgZXNzZSBtYXRlcmlhbCwgY3Vqb3MgZGlyZWl0b3Mgc8OjbyBkZSB0ZXJjZWlyb3MsIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvcyBkbyBkb2N1bWVudG8gZW50cmVndWUuCgpjKVNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIMOpIGJhc2VhZG8gZW0gdHJhYmFsaG8gZmluYW5jaWFkbyBvdSBhcG9pYWRvIHBvciBvdXRyYSBpbnN0aXR1acOnw6NvIHF1ZSBuw6NvIGEgVUZPUCwgZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gY29udHJhdG8gb3UgYWNvcmRvLgoKRepositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332019-10-17T18:28:23Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false
dc.title.pt_BR.fl_str_mv On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
title On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
spellingShingle On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
Santana, Alan Filipe
Supervised methods
title_short On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
title_full On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
title_fullStr On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
title_full_unstemmed On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
title_sort On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
author Santana, Alan Filipe
author_facet Santana, Alan Filipe
Gonçalves, André Gonçalves
Laender, Alberto Henrique Frade
Ferreira, Anderson Almeida
author_role author
author2 Gonçalves, André Gonçalves
Laender, Alberto Henrique Frade
Ferreira, Anderson Almeida
author2_role author
author
author
dc.contributor.author.fl_str_mv Santana, Alan Filipe
Gonçalves, André Gonçalves
Laender, Alberto Henrique Frade
Ferreira, Anderson Almeida
dc.subject.por.fl_str_mv Supervised methods
topic Supervised methods
description Author name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.
publishDate 2015
dc.date.issued.fl_str_mv 2015
dc.date.accessioned.fl_str_mv 2017-01-20T14:18:03Z
dc.date.available.fl_str_mv 2017-01-20T14:18:03Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv SANTANA, A. F. et al. On the combination of domain-specific heuristics for auhor name disambiguation : the nearest cluster method. International Journal on Digital Libraries, n. 16, p. 229-246, 2015. Disponível em: <https://link.springer.com/article/10.1007/s00799-015-0158-y>. Acesso em: 20 jan. 2017.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufop.br/handle/123456789/7140
dc.identifier.issn.none.fl_str_mv 1432-1300
dc.identifier.uri2.pt_BR.fl_str_mv https://link.springer.com/article/10.1007/s00799-015-0158-y
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1007/s00799-015-0158-y
identifier_str_mv SANTANA, A. F. et al. On the combination of domain-specific heuristics for auhor name disambiguation : the nearest cluster method. International Journal on Digital Libraries, n. 16, p. 229-246, 2015. Disponível em: <https://link.springer.com/article/10.1007/s00799-015-0158-y>. Acesso em: 20 jan. 2017.
1432-1300
url http://www.repositorio.ufop.br/handle/123456789/7140
https://link.springer.com/article/10.1007/s00799-015-0158-y
https://doi.org/10.1007/s00799-015-0158-y
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFOP
instname:Universidade Federal de Ouro Preto (UFOP)
instacron:UFOP
instname_str Universidade Federal de Ouro Preto (UFOP)
instacron_str UFOP
institution UFOP
reponame_str Repositório Institucional da UFOP
collection Repositório Institucional da UFOP
bitstream.url.fl_str_mv http://www.repositorio.ufop.br/bitstream/123456789/7140/2/license.txt
http://www.repositorio.ufop.br/bitstream/123456789/7140/1/ARTIGO_CombinationDomainSpecific.pdf
bitstream.checksum.fl_str_mv 62604f8d955274beb56c80ce1ee5dcae
e526d08555bcd1cf7df08b42e61ca4ee
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)
repository.mail.fl_str_mv repositorio@ufop.edu.br
_version_ 1801685792590921728