An automatic approach for duplicate bibliographic metadata identification using classification

Detalhes bibliográficos
Autor(a) principal: Borges, Eduardo Nunes
Data de Publicação: 2011
Outros Autores: Becker, Karin, Heuser, Carlos, Galante, Renata
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da FURG (RI FURG)
Texto Completo: http://repositorio.furg.br/handle/1/1702
Resumo: References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.
id FURG_a33c8c1d6c6f71606980d6d957f154ce
oai_identifier_str oai:repositorio.furg.br:1/1702
network_acronym_str FURG
network_name_str Repositório Institucional da FURG (RI FURG)
repository_id_str
spelling An automatic approach for duplicate bibliographic metadata identification using classificationClassification algorithmsInformation representationInformation managementReferences are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.2012-01-07T22:47:43Z2012-01-07T22:47:43Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectapplication/pdfBORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011.http://repositorio.furg.br/handle/1/1702engBorges, Eduardo NunesBecker, KarinHeuser, CarlosGalante, Renatainfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FURG (RI FURG)instname:Universidade Federal do Rio Grande (FURG)instacron:FURG2014-08-22T14:38:10Zoai:repositorio.furg.br:1/1702Repositório InstitucionalPUBhttps://repositorio.furg.br/oai/request || http://200.19.254.174/oai/requestopendoar:2014-08-22T14:38:10Repositório Institucional da FURG (RI FURG) - Universidade Federal do Rio Grande (FURG)false
dc.title.none.fl_str_mv An automatic approach for duplicate bibliographic metadata identification using classification
title An automatic approach for duplicate bibliographic metadata identification using classification
spellingShingle An automatic approach for duplicate bibliographic metadata identification using classification
Borges, Eduardo Nunes
Classification algorithms
Information representation
Information management
title_short An automatic approach for duplicate bibliographic metadata identification using classification
title_full An automatic approach for duplicate bibliographic metadata identification using classification
title_fullStr An automatic approach for duplicate bibliographic metadata identification using classification
title_full_unstemmed An automatic approach for duplicate bibliographic metadata identification using classification
title_sort An automatic approach for duplicate bibliographic metadata identification using classification
author Borges, Eduardo Nunes
author_facet Borges, Eduardo Nunes
Becker, Karin
Heuser, Carlos
Galante, Renata
author_role author
author2 Becker, Karin
Heuser, Carlos
Galante, Renata
author2_role author
author
author
dc.contributor.author.fl_str_mv Borges, Eduardo Nunes
Becker, Karin
Heuser, Carlos
Galante, Renata
dc.subject.por.fl_str_mv Classification algorithms
Information representation
Information management
topic Classification algorithms
Information representation
Information management
description References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.
publishDate 2011
dc.date.none.fl_str_mv 2011
2012-01-07T22:47:43Z
2012-01-07T22:47:43Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011.
http://repositorio.furg.br/handle/1/1702
identifier_str_mv BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011.
url http://repositorio.furg.br/handle/1/1702
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da FURG (RI FURG)
instname:Universidade Federal do Rio Grande (FURG)
instacron:FURG
instname_str Universidade Federal do Rio Grande (FURG)
instacron_str FURG
institution FURG
reponame_str Repositório Institucional da FURG (RI FURG)
collection Repositório Institucional da FURG (RI FURG)
repository.name.fl_str_mv Repositório Institucional da FURG (RI FURG) - Universidade Federal do Rio Grande (FURG)
repository.mail.fl_str_mv
_version_ 1807384394402365440