Automatic detection of missing information in the indexing of scientific publications

Detalhes bibliográficos
Autor(a) principal: Rodrigues, David Miguel Nunes
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/30270
Resumo: The number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database.
id RCAP_0482ac450be45e41f5d3c0646f32729b
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/30270
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automatic detection of missing information in the indexing of scientific publicationsResearch databasesCitationsWeb ScrapingWeb of ScienceScopusOpenAlexBases de indexaçãoCitaçõesThe number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database.A quantidade de citações que uma publicação científica recebe é uma métrica crucial. Uma publicação pode ser indexada por diferentes bases de indexação de artigos científicos, o que nos permite encontrar citações em falta relativas a essa publicação. Para colmatar esse problema, apresentamos uma solução que deteta automaticamente as citações em falta. Neste projeto, procuramos identificar citações em falta nas bases de indexação Web of Science, Scopus e Google Scholar, além de utilizar o OpenAlex para melhorar a quantidade de citações em falta encontradas. Durante este projeto, realizámos várias experiências, começando por um protótipo que apenas utilizava 2 bases de indexação (Web of Science e OpenAlex) e depois expandimos a nossa abordagem para incluir o Scopus. Infelizmente, não nos foi possível adicionar o Google Scholar. Ao realizar essas duas experiências, foi possível comparar os dados obtidos no Web of Science antes e depois da inclusão do Scopus, o que nos permitiu avaliar o impacto do acréscimo de uma base de dados na nossa abordagem. Posteriormente, realizamos outra experiência, a fim de avaliar as mudanças que as próprias bases de indexação vão tendo ao longo do tempo. Depois de analisar mais de 3 000 publicações, detectamos citações em falta em 874 publicações, totalizando 2 212 citações em falta, das quais 1 075 foram detectadas no Web of Science e 1 137 no Scopus. As 1 075 citações detectadas no Web of Science representam um aumento de 54% na quantidade de citações encontradas antes de acrescentar o Scopus à nossa abordagem.2024-01-09T10:25:32Z2023-11-30T00:00:00Z2023-11-302023-10info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/30270TID:203435087engRodrigues, David Miguel Nunesinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-14T01:17:20Zoai:repositorio.iscte-iul.pt:10071/30270Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:40:22.881097Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automatic detection of missing information in the indexing of scientific publications
title Automatic detection of missing information in the indexing of scientific publications
spellingShingle Automatic detection of missing information in the indexing of scientific publications
Rodrigues, David Miguel Nunes
Research databases
Citations
Web Scraping
Web of Science
Scopus
OpenAlex
Bases de indexação
Citações
title_short Automatic detection of missing information in the indexing of scientific publications
title_full Automatic detection of missing information in the indexing of scientific publications
title_fullStr Automatic detection of missing information in the indexing of scientific publications
title_full_unstemmed Automatic detection of missing information in the indexing of scientific publications
title_sort Automatic detection of missing information in the indexing of scientific publications
author Rodrigues, David Miguel Nunes
author_facet Rodrigues, David Miguel Nunes
author_role author
dc.contributor.author.fl_str_mv Rodrigues, David Miguel Nunes
dc.subject.por.fl_str_mv Research databases
Citations
Web Scraping
Web of Science
Scopus
OpenAlex
Bases de indexação
Citações
topic Research databases
Citations
Web Scraping
Web of Science
Scopus
OpenAlex
Bases de indexação
Citações
description The number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database.
publishDate 2023
dc.date.none.fl_str_mv 2023-11-30T00:00:00Z
2023-11-30
2023-10
2024-01-09T10:25:32Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/30270
TID:203435087
url http://hdl.handle.net/10071/30270
identifier_str_mv TID:203435087
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136892687482880