The nature of scientific datasets in South American repositories: a survey of formats and extensions

Detalhes bibliográficos
Autor(a) principal: Rodrigues, Marcello Mundim
Data de Publicação: 2022
Outros Autores: Lourenco, Cintia de Azevedo, Dias, Guilherme Ataide [UNESP]
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.5007/1518-2924.2022.e85148
http://hdl.handle.net/11449/237571
Resumo: Objective: identifying the scientific data repositories created and managed by Higher Education Institutions and/or South American research and funding agencies; identifying and describing the formats and extensions of files that compile the scientific datasets deposited in these repositories. Methods: eight repositories retrieved by RE3DATA were selected for investigation. A population (N) of 1.115 scientific datasets was obtained. By using Stratified Random Sampling, the resulting sample (n) value was 258 datasets, which corresponds to 23,15% of the population (N). Data surveyed from the samples were condensed into tables and charts. Results: it was noticed that the nature of the scientific datasets investigated is centered on textual and numerical data, saved in text files and tables, respectively. Also, the datasets may be either homogeneous (one or more files saved in a unique format and extension, e.g.: image format in.jpg) or heterogeneous (files saved in different formats and extensions, content of the data, as observed in the .gpx and gdb extensions, which refer to geospatial data, therefore, alphanumeric data. Conclusions: There is a growing need of describing the nature of data, as well as the formats and extensions of files. This kind of descriptive metadata would be valuable to potential users, as it would allow a greater understanding of the context of the data, focusing on data reuse.
id UNSP_344b8409046d931f027109c67f727fa8
oai_identifier_str oai:repositorio.unesp.br:11449/237571
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling The nature of scientific datasets in South American repositories: a survey of formats and extensionsScientific dataDatasetsData repositoriesFormats and extensionsSurveyObjective: identifying the scientific data repositories created and managed by Higher Education Institutions and/or South American research and funding agencies; identifying and describing the formats and extensions of files that compile the scientific datasets deposited in these repositories. Methods: eight repositories retrieved by RE3DATA were selected for investigation. A population (N) of 1.115 scientific datasets was obtained. By using Stratified Random Sampling, the resulting sample (n) value was 258 datasets, which corresponds to 23,15% of the population (N). Data surveyed from the samples were condensed into tables and charts. Results: it was noticed that the nature of the scientific datasets investigated is centered on textual and numerical data, saved in text files and tables, respectively. Also, the datasets may be either homogeneous (one or more files saved in a unique format and extension, e.g.: image format in.jpg) or heterogeneous (files saved in different formats and extensions, content of the data, as observed in the .gpx and gdb extensions, which refer to geospatial data, therefore, alphanumeric data. Conclusions: There is a growing need of describing the nature of data, as well as the formats and extensions of files. This kind of descriptive metadata would be valuable to potential users, as it would allow a greater understanding of the context of the data, focusing on data reuse.Univ Fed Minas Gerais, Doutorando Gestao & Org Conhecimento, Belo Horizonte, MG, BrazilUniv Fed Minas Gerais, Ciencia Informacao, Belo Horizonte, MG, BrazilUniv Fed Minas Gerais, Escola Ciencia Informacao, Belo Horizonte, MG, BrazilUniv Fed Paraiba, Dept Ciencia Informacao, Joao Pessoa, Paraiba, BrazilUniv Estadual Paulista, Ciencia Informacao, Sao Paulo, BrazilUniv Estadual Paulista, Ciencia Informacao, Sao Paulo, BrazilUniv Federal Santa CatarinaUniversidade Federal de Minas Gerais (UFMG)Univ Fed ParaibaUniversidade Estadual Paulista (UNESP)Rodrigues, Marcello MundimLourenco, Cintia de AzevedoDias, Guilherme Ataide [UNESP]2022-11-30T13:38:57Z2022-11-30T13:38:57Z2022-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article26http://dx.doi.org/10.5007/1518-2924.2022.e85148Encontros Bibli-revista Eletronica De Biblioteconomia E Ciencia Da Informacao. Florianopolis: Univ Federal Santa Catarina, v. 27, 26 p., 2022.1518-2924http://hdl.handle.net/11449/23757110.5007/1518-2924.2022.e85148WOS:000804414500004Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPporEncontros Bibli-revista Eletronica De Biblioteconomia E Ciencia Da Informacaoinfo:eu-repo/semantics/openAccess2022-11-30T13:38:57Zoai:repositorio.unesp.br:11449/237571Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T19:51:57.626580Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv The nature of scientific datasets in South American repositories: a survey of formats and extensions
title The nature of scientific datasets in South American repositories: a survey of formats and extensions
spellingShingle The nature of scientific datasets in South American repositories: a survey of formats and extensions
Rodrigues, Marcello Mundim
Scientific data
Datasets
Data repositories
Formats and extensions
Survey
title_short The nature of scientific datasets in South American repositories: a survey of formats and extensions
title_full The nature of scientific datasets in South American repositories: a survey of formats and extensions
title_fullStr The nature of scientific datasets in South American repositories: a survey of formats and extensions
title_full_unstemmed The nature of scientific datasets in South American repositories: a survey of formats and extensions
title_sort The nature of scientific datasets in South American repositories: a survey of formats and extensions
author Rodrigues, Marcello Mundim
author_facet Rodrigues, Marcello Mundim
Lourenco, Cintia de Azevedo
Dias, Guilherme Ataide [UNESP]
author_role author
author2 Lourenco, Cintia de Azevedo
Dias, Guilherme Ataide [UNESP]
author2_role author
author
dc.contributor.none.fl_str_mv Universidade Federal de Minas Gerais (UFMG)
Univ Fed Paraiba
Universidade Estadual Paulista (UNESP)
dc.contributor.author.fl_str_mv Rodrigues, Marcello Mundim
Lourenco, Cintia de Azevedo
Dias, Guilherme Ataide [UNESP]
dc.subject.por.fl_str_mv Scientific data
Datasets
Data repositories
Formats and extensions
Survey
topic Scientific data
Datasets
Data repositories
Formats and extensions
Survey
description Objective: identifying the scientific data repositories created and managed by Higher Education Institutions and/or South American research and funding agencies; identifying and describing the formats and extensions of files that compile the scientific datasets deposited in these repositories. Methods: eight repositories retrieved by RE3DATA were selected for investigation. A population (N) of 1.115 scientific datasets was obtained. By using Stratified Random Sampling, the resulting sample (n) value was 258 datasets, which corresponds to 23,15% of the population (N). Data surveyed from the samples were condensed into tables and charts. Results: it was noticed that the nature of the scientific datasets investigated is centered on textual and numerical data, saved in text files and tables, respectively. Also, the datasets may be either homogeneous (one or more files saved in a unique format and extension, e.g.: image format in.jpg) or heterogeneous (files saved in different formats and extensions, content of the data, as observed in the .gpx and gdb extensions, which refer to geospatial data, therefore, alphanumeric data. Conclusions: There is a growing need of describing the nature of data, as well as the formats and extensions of files. This kind of descriptive metadata would be valuable to potential users, as it would allow a greater understanding of the context of the data, focusing on data reuse.
publishDate 2022
dc.date.none.fl_str_mv 2022-11-30T13:38:57Z
2022-11-30T13:38:57Z
2022-01-01
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.5007/1518-2924.2022.e85148
Encontros Bibli-revista Eletronica De Biblioteconomia E Ciencia Da Informacao. Florianopolis: Univ Federal Santa Catarina, v. 27, 26 p., 2022.
1518-2924
http://hdl.handle.net/11449/237571
10.5007/1518-2924.2022.e85148
WOS:000804414500004
url http://dx.doi.org/10.5007/1518-2924.2022.e85148
http://hdl.handle.net/11449/237571
identifier_str_mv Encontros Bibli-revista Eletronica De Biblioteconomia E Ciencia Da Informacao. Florianopolis: Univ Federal Santa Catarina, v. 27, 26 p., 2022.
1518-2924
10.5007/1518-2924.2022.e85148
WOS:000804414500004
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv Encontros Bibli-revista Eletronica De Biblioteconomia E Ciencia Da Informacao
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 26
dc.publisher.none.fl_str_mv Univ Federal Santa Catarina
publisher.none.fl_str_mv Univ Federal Santa Catarina
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808129130455105536