Cost-effective on-demand associative author name disambiguation.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2012 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFOP |
Texto Completo: | http://www.repositorio.ufop.br/handle/123456789/1727 |
Resumo: | Authorship disambiguation is an urgent issue that affects the quality of digital library ser-vices and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation func-tions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores associa-tion rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypoth-esis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical. |
id |
UFOP_51d48060935bf403fec09a0d4568b3b1 |
---|---|
oai_identifier_str |
oai:repositorio.ufop.br:123456789/1727 |
network_acronym_str |
UFOP |
network_name_str |
Repositório Institucional da UFOP |
repository_id_str |
3233 |
spelling |
Cost-effective on-demand associative author name disambiguation.Machine learningDigital librariesAuthor name disambiguationAssociative methodsLazy strategiesAuthorship disambiguation is an urgent issue that affects the quality of digital library ser-vices and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation func-tions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores associa-tion rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypoth-esis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.2012-10-22T16:46:00Z2012-10-22T16:46:00Z2012info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfVELOSO, A. A. et al. Cost-effective on-demand associative author name disambiguation. Information Processing and Management, v. 48, n. 4, p. 680-697, 2012. Disponível em: <https://www.sciencedirect.com/science/article/pii/S0306457311000847>. Acesso em: 22 out. 2012http://www.repositorio.ufop.br/handle/123456789/1727O periódico Information Processing and Management concede permissão para depósito do artigo no Repositório Institucional da UFOP. Número da licença: 3291850076753.info:eu-repo/semantics/openAccessVeloso, Adriano AlonsoFerreira, Anderson AlmeidaGonçalves, Marcos AndréLaender, Alberto Henrique FradeMeira Júnior, Wagnerengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOP2019-03-13T14:53:38Zoai:repositorio.ufop.br:123456789/1727Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332019-03-13T14:53:38Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false |
dc.title.none.fl_str_mv |
Cost-effective on-demand associative author name disambiguation. |
title |
Cost-effective on-demand associative author name disambiguation. |
spellingShingle |
Cost-effective on-demand associative author name disambiguation. Veloso, Adriano Alonso Machine learning Digital libraries Author name disambiguation Associative methods Lazy strategies |
title_short |
Cost-effective on-demand associative author name disambiguation. |
title_full |
Cost-effective on-demand associative author name disambiguation. |
title_fullStr |
Cost-effective on-demand associative author name disambiguation. |
title_full_unstemmed |
Cost-effective on-demand associative author name disambiguation. |
title_sort |
Cost-effective on-demand associative author name disambiguation. |
author |
Veloso, Adriano Alonso |
author_facet |
Veloso, Adriano Alonso Ferreira, Anderson Almeida Gonçalves, Marcos André Laender, Alberto Henrique Frade Meira Júnior, Wagner |
author_role |
author |
author2 |
Ferreira, Anderson Almeida Gonçalves, Marcos André Laender, Alberto Henrique Frade Meira Júnior, Wagner |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Veloso, Adriano Alonso Ferreira, Anderson Almeida Gonçalves, Marcos André Laender, Alberto Henrique Frade Meira Júnior, Wagner |
dc.subject.por.fl_str_mv |
Machine learning Digital libraries Author name disambiguation Associative methods Lazy strategies |
topic |
Machine learning Digital libraries Author name disambiguation Associative methods Lazy strategies |
description |
Authorship disambiguation is an urgent issue that affects the quality of digital library ser-vices and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation func-tions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores associa-tion rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypoth-esis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical. |
publishDate |
2012 |
dc.date.none.fl_str_mv |
2012-10-22T16:46:00Z 2012-10-22T16:46:00Z 2012 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
VELOSO, A. A. et al. Cost-effective on-demand associative author name disambiguation. Information Processing and Management, v. 48, n. 4, p. 680-697, 2012. Disponível em: <https://www.sciencedirect.com/science/article/pii/S0306457311000847>. Acesso em: 22 out. 2012 http://www.repositorio.ufop.br/handle/123456789/1727 |
identifier_str_mv |
VELOSO, A. A. et al. Cost-effective on-demand associative author name disambiguation. Information Processing and Management, v. 48, n. 4, p. 680-697, 2012. Disponível em: <https://www.sciencedirect.com/science/article/pii/S0306457311000847>. Acesso em: 22 out. 2012 |
url |
http://www.repositorio.ufop.br/handle/123456789/1727 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFOP instname:Universidade Federal de Ouro Preto (UFOP) instacron:UFOP |
instname_str |
Universidade Federal de Ouro Preto (UFOP) |
instacron_str |
UFOP |
institution |
UFOP |
reponame_str |
Repositório Institucional da UFOP |
collection |
Repositório Institucional da UFOP |
repository.name.fl_str_mv |
Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP) |
repository.mail.fl_str_mv |
repositorio@ufop.edu.br |
_version_ |
1813002821943951360 |