NLP-crowdsourcing hybrid framework for inter-researcher similarity detection

Detalhes bibliográficos
Autor(a) principal: Correia, António
Data de Publicação: 2023
Outros Autores: Guimarães, Diogo, Paredes, Hugo, Fonseca, Benjamim, Paulino, Dennis, Trigo, Luís, Brazdil, Pavel, Schneider, Daniel, Grover, Andrea, Jameel, Shoaib
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/156929
Resumo: Visualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowdpowered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.
id RCAP_f580b085df622e74ba91b968a0cf7c6d
oai_identifier_str oai:repositorio-aberto.up.pt:10216/156929
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling NLP-crowdsourcing hybrid framework for inter-researcher similarity detectionVisualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowdpowered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/156929eng2168-229110.1109/THMS.2023.3319290Correia, AntónioGuimarães, DiogoParedes, HugoFonseca, BenjamimPaulino, DennisTrigo, LuísBrazdil, PavelSchneider, DanielGrover, AndreaJameel, Shoaibinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-02T01:25:35Zoai:repositorio-aberto.up.pt:10216/156929Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:59:30.591207Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
title NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
spellingShingle NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
Correia, António
title_short NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
title_full NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
title_fullStr NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
title_full_unstemmed NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
title_sort NLP-crowdsourcing hybrid framework for inter-researcher similarity detection
author Correia, António
author_facet Correia, António
Guimarães, Diogo
Paredes, Hugo
Fonseca, Benjamim
Paulino, Dennis
Trigo, Luís
Brazdil, Pavel
Schneider, Daniel
Grover, Andrea
Jameel, Shoaib
author_role author
author2 Guimarães, Diogo
Paredes, Hugo
Fonseca, Benjamim
Paulino, Dennis
Trigo, Luís
Brazdil, Pavel
Schneider, Daniel
Grover, Andrea
Jameel, Shoaib
author2_role author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Correia, António
Guimarães, Diogo
Paredes, Hugo
Fonseca, Benjamim
Paulino, Dennis
Trigo, Luís
Brazdil, Pavel
Schneider, Daniel
Grover, Andrea
Jameel, Shoaib
description Visualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowdpowered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.
publishDate 2023
dc.date.none.fl_str_mv 2023
2023-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/156929
url https://hdl.handle.net/10216/156929
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2168-2291
10.1109/THMS.2023.3319290
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137078393438208