Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015

Detalhes bibliográficos
Autor(a) principal: Augusto Afonso Guerra Júnior
Data de Publicação: 2018
Outros Autores: Ramon Gonçalves Pereira, Eli Iola Gurgel Andrade, Mariangela Leal Cherchiglia, Leonardo Vinícius Dias da Silva, Juliano Ávila, Núbia Santos, Afonso Reis, Francisco de Assis Acurcio, Wagner Meira Junior
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: https://doi.org/10.23889/ijpds.v3i1.446
http://hdl.handle.net/1843/50902
https://orcid.org/0000-0001-5256-0577
https://orcid.org/0000-0002-0206-2462
https://orcid.org/0000-0001-5622-567X
https://orcid.org/0000-0003-1956-5100
https://orcid.org/0000-0002-5880-5261
https://orcid.org/0000-0002-2614-2723
Resumo: Objective: To describe the methods and results of parameter setting that are needed to execute the probabilistic deduplication of large administrative and epidemiological databases in Brazil and to create a National Health Database Centred on the individual. Methods: This paper shows the results of a record linkage model to integrate data from SIH, SIA, SIM, and SINAN, which have different formats and attributes between them and over time. These data consistof 1.3 billion records from 2000-2015. Probabilistic and deterministic record linkages were used to deduplicate these data. The Kappa statistic and clerical review were used to ensure the quality ofthe linkage. The graph algorithm and depth-first search were used to generate the identifiers. Results: The deterministic deduplication process resulted in a database with 403,113,527 possible unique individuals. After the probabilistic deduplication process of the former database was performed,159,703,805 unique individuals were identified. This result had an estimated a false positive error rate of 3.3%, and the false negative error was estimated at 12.3%. Conclusions: The National Health Database centred on the individual was generated and will allow researchers to use real-world evidence to conduct clinical, epidemiological, economic and other studies. This database represents a significant cohort, spanning 15 years of historical data and preserving patient privacy. The success of the process described will allow repeating and appending the data for future years and enable important studies to promote SUS efficiency and provide better treatments for patients.
id UFMG_aa1112ad22d7ad743722275e6b2c948a
oai_identifier_str oai:repositorio.ufmg.br:1843/50902
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling 2023-03-14T21:43:02Z2023-03-14T21:43:02Z201831https://doi.org/10.23889/ijpds.v3i1.4462399-4908http://hdl.handle.net/1843/50902https://orcid.org/0000-0001-5256-0577https://orcid.org/0000-0002-0206-2462https://orcid.org/0000-0001-5622-567Xhttps://orcid.org/0000-0003-1956-5100https://orcid.org/0000-0002-5880-5261https://orcid.org/0000-0002-2614-2723Objective: To describe the methods and results of parameter setting that are needed to execute the probabilistic deduplication of large administrative and epidemiological databases in Brazil and to create a National Health Database Centred on the individual. Methods: This paper shows the results of a record linkage model to integrate data from SIH, SIA, SIM, and SINAN, which have different formats and attributes between them and over time. These data consistof 1.3 billion records from 2000-2015. Probabilistic and deterministic record linkages were used to deduplicate these data. The Kappa statistic and clerical review were used to ensure the quality ofthe linkage. The graph algorithm and depth-first search were used to generate the identifiers. Results: The deterministic deduplication process resulted in a database with 403,113,527 possible unique individuals. After the probabilistic deduplication process of the former database was performed,159,703,805 unique individuals were identified. This result had an estimated a false positive error rate of 3.3%, and the false negative error was estimated at 12.3%. Conclusions: The National Health Database centred on the individual was generated and will allow researchers to use real-world evidence to conduct clinical, epidemiological, economic and other studies. This database represents a significant cohort, spanning 15 years of historical data and preserving patient privacy. The success of the process described will allow repeating and appending the data for future years and enable important studies to promote SUS efficiency and provide better treatments for patients.Objetivo: Descrever os métodos e resultados de parametrização necessários para realizar a desduplicação probabilística de grandes bancos de dados administrativos e epidemiológicos no Brasil e criar um Banco Nacional de Dados de Saúde Centrado no indivíduo. Métodos: Este artigo apresenta os resultados de um modelo de vinculação de registros para integrar dados do SIH, SIA, SIM e SINAN, que possuem diferentes formatos e atributos entre si e ao longo do tempo. Esses dados consistem em 1,3 bilhão de registros de 2000-2015. Ligações de registros probabilísticas e determinísticas foram usadas para desduplicar esses dados. A estatística Kappa e a revisão clerical foram usadas para garantir a qualidade da ligação. O algoritmo do grafo e a busca em profundidade foram usados ​​para gerar os identificadores. Resultados: O processo de deduplicação determinística resultou em um banco de dados com 403.113.527 possíveis indivíduos únicos. Após a realização do processo de desduplicação probabilística da base de dados anterior, foram identificados 159.703.805 indivíduos únicos. Este resultado teve uma taxa de erro falso positivo estimada de 3,3%, e o erro falso negativo foi estimado em 12,3%. Conclusões: O Banco de Dados Nacional de Saúde centrado no indivíduo foi gerado e permitirá aos pesquisadores usar evidências do mundo real para realizar estudos clínicos, epidemiológicos, econômicos e outros. Este banco de dados representa uma coorte significativa, abrangendo 15 anos de dados históricos e preservando a privacidade do paciente. O sucesso do processo descrito permitirá repetir e anexar os dados para anos futuros e viabilizar estudos importantes para promover a eficiência do SUS e proporcionar melhores tratamentos aos pacientes.engUniversidade Federal de Minas GeraisUFMGBrasilFAR - DEPARTAMENTO DE FARMÁCIA SOCIALFARMACIA - FACULDADE DE FARMACIAICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOMED - DEPARTAMENTO DE MEDICINA PREVENTIVA SOCIALThe International Journal of Population Data Science (IJPDS)Sistema Único de SaúdeBanco de dados - SaúdeMedicina - Processamento de dadosData linkageRecord linkageBrazilian health databaseSUS deduplicationBuilding the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015Construindo o Banco Nacional de Dados de Saúde Centrada no Indivíduo: Relacionamento Administrativo e Ficha Epidemiológica - Brasil, 2000-2015info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://ijpds.org/article/view/446Augusto Afonso Guerra JúniorRamon Gonçalves PereiraEli Iola Gurgel AndradeMariangela Leal CherchigliaLeonardo Vinícius Dias da SilvaJuliano ÁvilaNúbia SantosAfonso ReisFrancisco de Assis AcurcioWagner Meira Juniorapplication/pdfinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGLICENSELicense.txtLicense.txttext/plain; charset=utf-82042https://repositorio.ufmg.br/bitstream/1843/50902/1/License.txtfa505098d172de0bc8864fc1287ffe22MD51ORIGINALBuilding the National Database of Health Centred on the Individual Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015.pdfBuilding the National Database of Health Centred on the Individual Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015.pdfapplication/pdf700418https://repositorio.ufmg.br/bitstream/1843/50902/2/Building%20the%20National%20Database%20of%20Health%20Centred%20on%20the%20Individual%20Adminis-trative%20and%20Epidemiological%20Record%20Linkage%20-%20Brazil%2c%202000-2015.pdfa2d0bb91fee394ecc836ddcdeac7b519MD521843/509022023-03-14 18:57:10.717oai:repositorio.ufmg.br:1843/50902TElDRU7vv71BIERFIERJU1RSSUJVSe+/ve+/vU8gTu+/vU8tRVhDTFVTSVZBIERPIFJFUE9TSVTvv71SSU8gSU5TVElUVUNJT05BTCBEQSBVRk1HCiAKCkNvbSBhIGFwcmVzZW50Ye+/ve+/vW8gZGVzdGEgbGljZW7vv71hLCB2b2Pvv70gKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIGFvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbu+/vW8gZXhjbHVzaXZvIGUgaXJyZXZvZ++/vXZlbCBkZSByZXByb2R1emlyIGUvb3UgZGlzdHJpYnVpciBhIHN1YSBwdWJsaWNh77+977+9byAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0cu+/vW5pY28gZSBlbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mg77+9dWRpbyBvdSB277+9ZGVvLgoKVm9j77+9IGRlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zvv710aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2Pvv70gY29uY29yZGEgcXVlIG8gUmVwb3NpdO+/vXJpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250Ze+/vWRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNh77+977+9byBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHvv73vv71vLgoKVm9j77+9IHRhbWLvv71tIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPvv71waWEgZGUgc3VhIHB1YmxpY2Hvv73vv71vIHBhcmEgZmlucyBkZSBzZWd1cmFu77+9YSwgYmFjay11cCBlIHByZXNlcnZh77+977+9by4KClZvY++/vSBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNh77+977+9byDvv70gb3JpZ2luYWwgZSBxdWUgdm9j77+9IHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vu77+9YS4gVm9j77+9IHRhbWLvv71tIGRlY2xhcmEgcXVlIG8gZGVw77+9c2l0byBkZSBzdWEgcHVibGljYe+/ve+/vW8gbu+/vW8sIHF1ZSBzZWphIGRlIHNldSBjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd177+9bS4KCkNhc28gYSBzdWEgcHVibGljYe+/ve+/vW8gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY++/vSBu77+9byBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2Pvv70gZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc++/vW8gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciBhbyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7vv71hLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3Tvv70gY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250Ze+/vWRvIGRhIHB1YmxpY2Hvv73vv71vIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFBVQkxJQ0Hvv73vv71PIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ++/vU5JTyBPVSBBUE9JTyBERSBVTUEgQUfvv71OQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0Pvv70gREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklT77+9TyBDT01PIFRBTULvv71NIEFTIERFTUFJUyBPQlJJR0Hvv73vv71FUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNh77+977+9bywgZSBu77+9byBmYXLvv70gcXVhbHF1ZXIgYWx0ZXJh77+977+9bywgYWzvv71tIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7vv71hLgo=Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2023-03-14T21:57:10Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
dc.title.alternative.pt_BR.fl_str_mv Construindo o Banco Nacional de Dados de Saúde Centrada no Indivíduo: Relacionamento Administrativo e Ficha Epidemiológica - Brasil, 2000-2015
title Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
spellingShingle Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
Augusto Afonso Guerra Júnior
Data linkage
Record linkage
Brazilian health database
SUS deduplication
Sistema Único de Saúde
Banco de dados - Saúde
Medicina - Processamento de dados
title_short Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
title_full Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
title_fullStr Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
title_full_unstemmed Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
title_sort Building the National Database of Health Centred on the Individual: Adminis-trative and Epidemiological Record Linkage - Brazil, 2000-2015
author Augusto Afonso Guerra Júnior
author_facet Augusto Afonso Guerra Júnior
Ramon Gonçalves Pereira
Eli Iola Gurgel Andrade
Mariangela Leal Cherchiglia
Leonardo Vinícius Dias da Silva
Juliano Ávila
Núbia Santos
Afonso Reis
Francisco de Assis Acurcio
Wagner Meira Junior
author_role author
author2 Ramon Gonçalves Pereira
Eli Iola Gurgel Andrade
Mariangela Leal Cherchiglia
Leonardo Vinícius Dias da Silva
Juliano Ávila
Núbia Santos
Afonso Reis
Francisco de Assis Acurcio
Wagner Meira Junior
author2_role author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Augusto Afonso Guerra Júnior
Ramon Gonçalves Pereira
Eli Iola Gurgel Andrade
Mariangela Leal Cherchiglia
Leonardo Vinícius Dias da Silva
Juliano Ávila
Núbia Santos
Afonso Reis
Francisco de Assis Acurcio
Wagner Meira Junior
dc.subject.por.fl_str_mv Data linkage
Record linkage
Brazilian health database
SUS deduplication
topic Data linkage
Record linkage
Brazilian health database
SUS deduplication
Sistema Único de Saúde
Banco de dados - Saúde
Medicina - Processamento de dados
dc.subject.other.pt_BR.fl_str_mv Sistema Único de Saúde
Banco de dados - Saúde
Medicina - Processamento de dados
description Objective: To describe the methods and results of parameter setting that are needed to execute the probabilistic deduplication of large administrative and epidemiological databases in Brazil and to create a National Health Database Centred on the individual. Methods: This paper shows the results of a record linkage model to integrate data from SIH, SIA, SIM, and SINAN, which have different formats and attributes between them and over time. These data consistof 1.3 billion records from 2000-2015. Probabilistic and deterministic record linkages were used to deduplicate these data. The Kappa statistic and clerical review were used to ensure the quality ofthe linkage. The graph algorithm and depth-first search were used to generate the identifiers. Results: The deterministic deduplication process resulted in a database with 403,113,527 possible unique individuals. After the probabilistic deduplication process of the former database was performed,159,703,805 unique individuals were identified. This result had an estimated a false positive error rate of 3.3%, and the false negative error was estimated at 12.3%. Conclusions: The National Health Database centred on the individual was generated and will allow researchers to use real-world evidence to conduct clinical, epidemiological, economic and other studies. This database represents a significant cohort, spanning 15 years of historical data and preserving patient privacy. The success of the process described will allow repeating and appending the data for future years and enable important studies to promote SUS efficiency and provide better treatments for patients.
publishDate 2018
dc.date.issued.fl_str_mv 2018
dc.date.accessioned.fl_str_mv 2023-03-14T21:43:02Z
dc.date.available.fl_str_mv 2023-03-14T21:43:02Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/50902
dc.identifier.doi.pt_BR.fl_str_mv https://doi.org/10.23889/ijpds.v3i1.446
dc.identifier.issn.pt_BR.fl_str_mv 2399-4908
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0001-5256-0577
https://orcid.org/0000-0002-0206-2462
https://orcid.org/0000-0001-5622-567X
https://orcid.org/0000-0003-1956-5100
https://orcid.org/0000-0002-5880-5261
https://orcid.org/0000-0002-2614-2723
url https://doi.org/10.23889/ijpds.v3i1.446
http://hdl.handle.net/1843/50902
https://orcid.org/0000-0001-5256-0577
https://orcid.org/0000-0002-0206-2462
https://orcid.org/0000-0001-5622-567X
https://orcid.org/0000-0003-1956-5100
https://orcid.org/0000-0002-5880-5261
https://orcid.org/0000-0002-2614-2723
identifier_str_mv 2399-4908
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartof.none.fl_str_mv The International Journal of Population Data Science (IJPDS)
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv FAR - DEPARTAMENTO DE FARMÁCIA SOCIAL
FARMACIA - FACULDADE DE FARMACIA
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
MED - DEPARTAMENTO DE MEDICINA PREVENTIVA SOCIAL
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/50902/1/License.txt
https://repositorio.ufmg.br/bitstream/1843/50902/2/Building%20the%20National%20Database%20of%20Health%20Centred%20on%20the%20Individual%20Adminis-trative%20and%20Epidemiological%20Record%20Linkage%20-%20Brazil%2c%202000-2015.pdf
bitstream.checksum.fl_str_mv fa505098d172de0bc8864fc1287ffe22
a2d0bb91fee394ecc836ddcdeac7b519
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1801676667346747392