Automatic detection of missing information in the indexing of scientific publications
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10071/30270 |
Resumo: | The number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database. |
id |
RCAP_0482ac450be45e41f5d3c0646f32729b |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/30270 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automatic detection of missing information in the indexing of scientific publicationsResearch databasesCitationsWeb ScrapingWeb of ScienceScopusOpenAlexBases de indexaçãoCitaçõesThe number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database.A quantidade de citações que uma publicação científica recebe é uma métrica crucial. Uma publicação pode ser indexada por diferentes bases de indexação de artigos científicos, o que nos permite encontrar citações em falta relativas a essa publicação. Para colmatar esse problema, apresentamos uma solução que deteta automaticamente as citações em falta. Neste projeto, procuramos identificar citações em falta nas bases de indexação Web of Science, Scopus e Google Scholar, além de utilizar o OpenAlex para melhorar a quantidade de citações em falta encontradas. Durante este projeto, realizámos várias experiências, começando por um protótipo que apenas utilizava 2 bases de indexação (Web of Science e OpenAlex) e depois expandimos a nossa abordagem para incluir o Scopus. Infelizmente, não nos foi possível adicionar o Google Scholar. Ao realizar essas duas experiências, foi possível comparar os dados obtidos no Web of Science antes e depois da inclusão do Scopus, o que nos permitiu avaliar o impacto do acréscimo de uma base de dados na nossa abordagem. Posteriormente, realizamos outra experiência, a fim de avaliar as mudanças que as próprias bases de indexação vão tendo ao longo do tempo. Depois de analisar mais de 3 000 publicações, detectamos citações em falta em 874 publicações, totalizando 2 212 citações em falta, das quais 1 075 foram detectadas no Web of Science e 1 137 no Scopus. As 1 075 citações detectadas no Web of Science representam um aumento de 54% na quantidade de citações encontradas antes de acrescentar o Scopus à nossa abordagem.2024-01-09T10:25:32Z2023-11-30T00:00:00Z2023-11-302023-10info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/30270TID:203435087engRodrigues, David Miguel Nunesinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-14T01:17:20Zoai:repositorio.iscte-iul.pt:10071/30270Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:40:22.881097Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automatic detection of missing information in the indexing of scientific publications |
title |
Automatic detection of missing information in the indexing of scientific publications |
spellingShingle |
Automatic detection of missing information in the indexing of scientific publications Rodrigues, David Miguel Nunes Research databases Citations Web Scraping Web of Science Scopus OpenAlex Bases de indexação Citações |
title_short |
Automatic detection of missing information in the indexing of scientific publications |
title_full |
Automatic detection of missing information in the indexing of scientific publications |
title_fullStr |
Automatic detection of missing information in the indexing of scientific publications |
title_full_unstemmed |
Automatic detection of missing information in the indexing of scientific publications |
title_sort |
Automatic detection of missing information in the indexing of scientific publications |
author |
Rodrigues, David Miguel Nunes |
author_facet |
Rodrigues, David Miguel Nunes |
author_role |
author |
dc.contributor.author.fl_str_mv |
Rodrigues, David Miguel Nunes |
dc.subject.por.fl_str_mv |
Research databases Citations Web Scraping Web of Science Scopus OpenAlex Bases de indexação Citações |
topic |
Research databases Citations Web Scraping Web of Science Scopus OpenAlex Bases de indexação Citações |
description |
The number of citations received by a research paper is a vital metric for both researchers and institutions. Various indexing databases share common citations, facilitating cross-database comparison to identify citations missing from one or more databases, which are not contributing to a paper’s total citation count. To address this issue, we have developed an automated method for identifying missing citations by leveraging multiple indexing databases. In this research, we sought to identify these missing citations in Web of Science, Scopus, and Google Scholar while also utilizing OpenAlex to aid in this process. Our research journey involved multiple experiments. Initially, we started with a prototype that used only two databases (Web of Science and OpenAlex) and later expanded our approach to include Scopus. Unfortunately, we were unable to incorporate Google Scholar. By conducting these experiments, we were able to compare the data found in Web of Science and gain a deeper understanding of the impact of adding a new database. We also repeated the same experiment one month later to track the changes that occur over time in these databases. After analyzing more than 3 000 different publications, we successfully identified missing citations in 847 of them, totaling 2 212 missing citations. Out of these, 1 075 were missing from Web of Science, and 1 137 were missing from Scopus. The addition of Scopus to our approach resulted in a 54% increase in the number of missing citations detected in Web of Science, highlighting the significant impact of incorporating this database. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-11-30T00:00:00Z 2023-11-30 2023-10 2024-01-09T10:25:32Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/30270 TID:203435087 |
url |
http://hdl.handle.net/10071/30270 |
identifier_str_mv |
TID:203435087 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136892687482880 |