An automatic approach for duplicate bibliographic metadata identification using classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Outros Autores: | , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da FURG (RI FURG) |
Texto Completo: | http://repositorio.furg.br/handle/1/1702 |
Resumo: | References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach. |
id |
FURG_a33c8c1d6c6f71606980d6d957f154ce |
---|---|
oai_identifier_str |
oai:repositorio.furg.br:1/1702 |
network_acronym_str |
FURG |
network_name_str |
Repositório Institucional da FURG (RI FURG) |
repository_id_str |
|
spelling |
An automatic approach for duplicate bibliographic metadata identification using classificationClassification algorithmsInformation representationInformation managementReferences are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.2012-01-07T22:47:43Z2012-01-07T22:47:43Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectapplication/pdfBORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011.http://repositorio.furg.br/handle/1/1702engBorges, Eduardo NunesBecker, KarinHeuser, CarlosGalante, Renatainfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FURG (RI FURG)instname:Universidade Federal do Rio Grande (FURG)instacron:FURG2014-08-22T14:38:10Zoai:repositorio.furg.br:1/1702Repositório InstitucionalPUBhttps://repositorio.furg.br/oai/request || http://200.19.254.174/oai/requestopendoar:2014-08-22T14:38:10Repositório Institucional da FURG (RI FURG) - Universidade Federal do Rio Grande (FURG)false |
dc.title.none.fl_str_mv |
An automatic approach for duplicate bibliographic metadata identification using classification |
title |
An automatic approach for duplicate bibliographic metadata identification using classification |
spellingShingle |
An automatic approach for duplicate bibliographic metadata identification using classification Borges, Eduardo Nunes Classification algorithms Information representation Information management |
title_short |
An automatic approach for duplicate bibliographic metadata identification using classification |
title_full |
An automatic approach for duplicate bibliographic metadata identification using classification |
title_fullStr |
An automatic approach for duplicate bibliographic metadata identification using classification |
title_full_unstemmed |
An automatic approach for duplicate bibliographic metadata identification using classification |
title_sort |
An automatic approach for duplicate bibliographic metadata identification using classification |
author |
Borges, Eduardo Nunes |
author_facet |
Borges, Eduardo Nunes Becker, Karin Heuser, Carlos Galante, Renata |
author_role |
author |
author2 |
Becker, Karin Heuser, Carlos Galante, Renata |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Borges, Eduardo Nunes Becker, Karin Heuser, Carlos Galante, Renata |
dc.subject.por.fl_str_mv |
Classification algorithms Information representation Information management |
topic |
Classification algorithms Information representation Information management |
description |
References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach. |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011 2012-01-07T22:47:43Z 2012-01-07T22:47:43Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011. http://repositorio.furg.br/handle/1/1702 |
identifier_str_mv |
BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011. |
url |
http://repositorio.furg.br/handle/1/1702 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da FURG (RI FURG) instname:Universidade Federal do Rio Grande (FURG) instacron:FURG |
instname_str |
Universidade Federal do Rio Grande (FURG) |
instacron_str |
FURG |
institution |
FURG |
reponame_str |
Repositório Institucional da FURG (RI FURG) |
collection |
Repositório Institucional da FURG (RI FURG) |
repository.name.fl_str_mv |
Repositório Institucional da FURG (RI FURG) - Universidade Federal do Rio Grande (FURG) |
repository.mail.fl_str_mv |
|
_version_ |
1813187260286238720 |