On building a tool for finding datasets based on a list of researchers or publications

Detalhes bibliográficos
Autor(a) principal: Carvalho-Segundo, Washington
Data de Publicação: 2021
Outros Autores: M. R. Dias, Thiago
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional do IBICT - RIDI
Texto Completo: http://ridi.ibict.br/handle/123456789/1264
Resumo: This proposal presents a tool developed in the Python language used to find related datasets of a list of researchers or publications. This tool was applied to a list of articles that a specific group of researchers had declared in their CVs. The target group was chosen based on the highest level that these researchers had obtained in a research productivity grant (1A). As a result, form a list of 1,227 researchers and more than 225 thousand deduplicated publications, it was possible to find 12,030 related datasets, were the most frequent access type is OPEN and the five most frequent related areas of research are Zoology; Chemistry; Genetics; Physics; and Agronomy. The proposed tool will be applied to facilitate populating the research data repository of the national funding agency in Brazil, but it can also be used in other more general contexts, extracting information from open databases, such as ORCID and Wikidata.
id IBICT_b526d0bdcd30f4fac521a69b51cd727d
oai_identifier_str oai:ridi.ibict.br:123456789/1264
network_acronym_str IBICT
network_name_str Repositório Institucional do IBICT - RIDI
repository_id_str 2404
spelling 2023-11-16T15:16:16Z2021-062023-11-16T15:16:16Z2021-06http://ridi.ibict.br/handle/123456789/1264This proposal presents a tool developed in the Python language used to find related datasets of a list of researchers or publications. This tool was applied to a list of articles that a specific group of researchers had declared in their CVs. The target group was chosen based on the highest level that these researchers had obtained in a research productivity grant (1A). As a result, form a list of 1,227 researchers and more than 225 thousand deduplicated publications, it was possible to find 12,030 related datasets, were the most frequent access type is OPEN and the five most frequent related areas of research are Zoology; Chemistry; Genetics; Physics; and Agronomy. The proposed tool will be applied to facilitate populating the research data repository of the national funding agency in Brazil, but it can also be used in other more general contexts, extracting information from open databases, such as ORCID and Wikidata.This proposal presents a tool developed in the Python language used to find related datasets of a list of researchers or publications. This tool was applied to a list of articles that a specific group of researchers had declared in their CVs. The target group was chosen based on the highest level that these researchers had obtained in a research productivity grant (1A). As a result, form a list of 1,227 researchers and more than 225 thousand deduplicated publications, it was possible to find 12,030 related datasets, were the most frequent access type is OPEN and the five most frequent related areas of research are Zoology; Chemistry; Genetics; Physics; and Agronomy. The proposed tool will be applied to facilitate populating the research data repository of the national funding agency in Brazil, but it can also be used in other more general contexts, extracting information from open databases, such as ORCID and Wikidata.Submitted by Washington Segundo (washingtonsegundo@ibict.br) on 2023-11-16T15:15:57Z No. of bitstreams: 1 OR2021_A_tool_for_finding_datasets_based.pdf: 222838 bytes, checksum: 6f79fa8ea2221dbd35d40c6c772bba3a (MD5)Approved for entry into archive by Washington Segundo (washingtonsegundo@ibict.br) on 2023-11-16T15:16:16Z (GMT) No. of bitstreams: 1 OR2021_A_tool_for_finding_datasets_based.pdf: 222838 bytes, checksum: 6f79fa8ea2221dbd35d40c6c772bba3a (MD5)Made available in DSpace on 2023-11-16T15:16:16Z (GMT). No. of bitstreams: 1 OR2021_A_tool_for_finding_datasets_based.pdf: 222838 bytes, checksum: 6f79fa8ea2221dbd35d40c6c772bba3a (MD5) Previous issue date: 2021-06engInstituto Brasileiro de Informação em Ciência e TecnologiaIBICTBrasilInternational Open Repositories ConferenceCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAOOpen ScienceScientific Data RepositoriesScientific PublicationsOpen DataOn building a tool for finding datasets based on a list of researchers or publicationsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject16Carvalho-Segundo, WashingtonM. R. Dias, Thiagoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional do IBICT - RIDIinstname:Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)instacron:IBICTTEXTOR2021_A_tool_for_finding_datasets_based.pdf.txtOR2021_A_tool_for_finding_datasets_based.pdf.txtExtracted texttext/plain13255https://ridi.ibict.br/bitstream/123456789/1264/3/OR2021_A_tool_for_finding_datasets_based.pdf.txtf15eb776caf6166bdb3f130d8d8fc1d8MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81862https://ridi.ibict.br/bitstream/123456789/1264/2/license.txt6b42f084aa6b52acc41c67281d72287fMD52ORIGINALOR2021_A_tool_for_finding_datasets_based.pdfOR2021_A_tool_for_finding_datasets_based.pdfapplication/pdf222838https://ridi.ibict.br/bitstream/123456789/1264/1/OR2021_A_tool_for_finding_datasets_based.pdf6f79fa8ea2221dbd35d40c6c772bba3aMD51123456789/12642023-11-17 03:00:28.382oai:ridi.ibict.br:123456789/1264TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgYW8gUmVwb3NpdMOzcmlvIApJbnN0aXR1Y2lvbmFsIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyLCAgdHJhZHV6aXIgKGNvbmZvcm1lIGRlZmluaWRvIGFiYWl4byksIGUvb3UgZGlzdHJpYnVpciBhIApzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIApmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIG8gRGVwb3NpdGEgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHB1YmxpY2HDp8OjbyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byAKcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJJREkgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgCmUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHB1YmxpY2HDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gClZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHB1YmxpY2HDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgY29uaGVjaW1lbnRvLCBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyAKZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSAKb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIGFvIERlcG9zaXRhIG9zIGRpcmVpdG9zIGFwcmVzZW50YWRvcyAKbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gCm91IG5vIGNvbnRlw7pkbyBkYSBwdWJsaWNhw6fDo28gb3JhIGRlcG9zaXRhZGEuCgpDQVNPIEEgUFVCTElDQcOHw4NPIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ8ONTklPIE9VIEFQT0lPIERFIFVNQSBBR8OKTkNJQSBERSBGT01FTlRPIE9VIE9VVFJPIApPUkdBTklTTU8sIFZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PIFRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyAKRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gRGVwb3NpdGEgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIGRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgCmF1dG9yYWlzIGRhIHB1YmxpY2HDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Repositório InstitucionalPUBhttps://ridi.ibict.br/oai/requestrd@ibict.bropendoar:24042023-11-17T06:00:28Repositório Institucional do IBICT - RIDI - Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)false
dc.title.pt_BR.fl_str_mv On building a tool for finding datasets based on a list of researchers or publications
title On building a tool for finding datasets based on a list of researchers or publications
spellingShingle On building a tool for finding datasets based on a list of researchers or publications
Carvalho-Segundo, Washington
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
Open Science
Scientific Data Repositories
Scientific Publications
Open Data
title_short On building a tool for finding datasets based on a list of researchers or publications
title_full On building a tool for finding datasets based on a list of researchers or publications
title_fullStr On building a tool for finding datasets based on a list of researchers or publications
title_full_unstemmed On building a tool for finding datasets based on a list of researchers or publications
title_sort On building a tool for finding datasets based on a list of researchers or publications
author Carvalho-Segundo, Washington
author_facet Carvalho-Segundo, Washington
M. R. Dias, Thiago
author_role author
author2 M. R. Dias, Thiago
author2_role author
dc.contributor.author.fl_str_mv Carvalho-Segundo, Washington
M. R. Dias, Thiago
dc.subject.cnpq.fl_str_mv CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
topic CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
Open Science
Scientific Data Repositories
Scientific Publications
Open Data
dc.subject.por.fl_str_mv Open Science
Scientific Data Repositories
Scientific Publications
Open Data
description This proposal presents a tool developed in the Python language used to find related datasets of a list of researchers or publications. This tool was applied to a list of articles that a specific group of researchers had declared in their CVs. The target group was chosen based on the highest level that these researchers had obtained in a research productivity grant (1A). As a result, form a list of 1,227 researchers and more than 225 thousand deduplicated publications, it was possible to find 12,030 related datasets, were the most frequent access type is OPEN and the five most frequent related areas of research are Zoology; Chemistry; Genetics; Physics; and Agronomy. The proposed tool will be applied to facilitate populating the research data repository of the national funding agency in Brazil, but it can also be used in other more general contexts, extracting information from open databases, such as ORCID and Wikidata.
publishDate 2021
dc.date.available.fl_str_mv 2021-06
2023-11-16T15:16:16Z
dc.date.issued.fl_str_mv 2021-06
dc.date.accessioned.fl_str_mv 2023-11-16T15:16:16Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://ridi.ibict.br/handle/123456789/1264
url http://ridi.ibict.br/handle/123456789/1264
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartof.pt_BR.fl_str_mv International Open Repositories Conference
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Instituto Brasileiro de Informação em Ciência e Tecnologia
dc.publisher.initials.fl_str_mv IBICT
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Instituto Brasileiro de Informação em Ciência e Tecnologia
dc.source.none.fl_str_mv reponame:Repositório Institucional do IBICT - RIDI
instname:Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)
instacron:IBICT
instname_str Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)
instacron_str IBICT
institution IBICT
reponame_str Repositório Institucional do IBICT - RIDI
collection Repositório Institucional do IBICT - RIDI
bitstream.url.fl_str_mv https://ridi.ibict.br/bitstream/123456789/1264/3/OR2021_A_tool_for_finding_datasets_based.pdf.txt
https://ridi.ibict.br/bitstream/123456789/1264/2/license.txt
https://ridi.ibict.br/bitstream/123456789/1264/1/OR2021_A_tool_for_finding_datasets_based.pdf
bitstream.checksum.fl_str_mv f15eb776caf6166bdb3f130d8d8fc1d8
6b42f084aa6b52acc41c67281d72287f
6f79fa8ea2221dbd35d40c6c772bba3a
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional do IBICT - RIDI - Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)
repository.mail.fl_str_mv rd@ibict.br
_version_ 1797055776628408320