BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Detalhes bibliográficos
Autor(a) principal: Consoli, Bernardo
Data de Publicação: 2022
Outros Autores: Dias, Henrique, Vieira, Renata, Bordini, Rafael, Ana, Ulbrich
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/32260
Resumo: Computational medicine research requires clinical data for training and testing purposes, so the development of datasets composed of real hospital data is of utmost importance in this field. Most such data collections are in the English language, were collected in anglophone countries, and do not reflect other clinical realities, which increases the importance of national datasets for projects that hope to positively impact public health. This paper presents a new Brazilian Clinical Dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states, composed of a sum total of over 2.5 million free-text clinical notes alongside data pertaining to patient information, prescription information, and exam results. This data was collected, organized, deidentified, and is being distributed via credentialed access for the use of the research community. In the course of presenting the new dataset, this paper will explore the new dataset’s structure, population, and potential benefits of using this dataset in clinical AI tasks.
id RCAP_a24668274a873ebec1f8c69e35df6ae5
oai_identifier_str oai:dspace.uevora.pt:10174/32260
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese LanguageComputational medicine research requires clinical data for training and testing purposes, so the development of datasets composed of real hospital data is of utmost importance in this field. Most such data collections are in the English language, were collected in anglophone countries, and do not reflect other clinical realities, which increases the importance of national datasets for projects that hope to positively impact public health. This paper presents a new Brazilian Clinical Dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states, composed of a sum total of over 2.5 million free-text clinical notes alongside data pertaining to patient information, prescription information, and exam results. This data was collected, organized, deidentified, and is being distributed via credentialed access for the use of the research community. In the course of presenting the new dataset, this paper will explore the new dataset’s structure, population, and potential benefits of using this dataset in clinical AI tasks.FCT UIDB/00057/2020, CEECIND/01997/2017LREC2022-07-05T11:06:34Z2022-07-052022-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/32260http://hdl.handle.net/10174/32260engConsoli, B, Dias, H., Ulbrich, A., Vieira, R., Bordini, R. (2022) BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese LanguageProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 5609–5616 Marseille, 20-25 June 2022 © European Language Resources Association (ELRA)http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.602.pdfndndrenatav@uevora.ptndnd299Consoli, BernardoDias, HenriqueVieira, RenataBordini, RafaelAna, Ulbrichinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:32:46Zoai:dspace.uevora.pt:10174/32260Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:21:18.424969Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
title BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
spellingShingle BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
Consoli, Bernardo
title_short BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
title_full BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
title_fullStr BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
title_full_unstemmed BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
title_sort BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
author Consoli, Bernardo
author_facet Consoli, Bernardo
Dias, Henrique
Vieira, Renata
Bordini, Rafael
Ana, Ulbrich
author_role author
author2 Dias, Henrique
Vieira, Renata
Bordini, Rafael
Ana, Ulbrich
author2_role author
author
author
author
dc.contributor.author.fl_str_mv Consoli, Bernardo
Dias, Henrique
Vieira, Renata
Bordini, Rafael
Ana, Ulbrich
description Computational medicine research requires clinical data for training and testing purposes, so the development of datasets composed of real hospital data is of utmost importance in this field. Most such data collections are in the English language, were collected in anglophone countries, and do not reflect other clinical realities, which increases the importance of national datasets for projects that hope to positively impact public health. This paper presents a new Brazilian Clinical Dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states, composed of a sum total of over 2.5 million free-text clinical notes alongside data pertaining to patient information, prescription information, and exam results. This data was collected, organized, deidentified, and is being distributed via credentialed access for the use of the research community. In the course of presenting the new dataset, this paper will explore the new dataset’s structure, population, and potential benefits of using this dataset in clinical AI tasks.
publishDate 2022
dc.date.none.fl_str_mv 2022-07-05T11:06:34Z
2022-07-05
2022-06-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/32260
http://hdl.handle.net/10174/32260
url http://hdl.handle.net/10174/32260
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Consoli, B, Dias, H., Ulbrich, A., Vieira, R., Bordini, R. (2022) BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese LanguageProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 5609–5616 Marseille, 20-25 June 2022 © European Language Resources Association (ELRA)
http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.602.pdf
nd
nd
renatav@uevora.pt
nd
nd
299
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv LREC
publisher.none.fl_str_mv LREC
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136694746742784