Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem
Autor(a) principal: | |
---|---|
Data de Publicação: | 2004 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Cadernos de Saúde Pública |
Texto Completo: | https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215 |
Resumo: | Probabilistic record linkage allows the assembling of information from different data sources. We present a procedure when a one-to-one relationship between records in different files is expected but not found. Data were births and infant deaths, 1998-birth cohort, city of São Paulo, Brazil. Pairs for which a one-to-one relationship was obtained and a best-link was found with the highest weight were taken as unequivocally matched pairs and provided information to decide on the remaining pairs. For these, an expected relationship between differences in dates of death and birth registration was found; and places of birth and death registration for neonatal deaths were likely to be the same. Such evidence was used to solve for the remaining pairs. We reduced the number of non-uniquely matched records and of uncertain matches, and increased the number of uniquely matched pairs from 2,249 to 2,827. Future research using record linkage should use strategies from first record linkage runs before a full clerical review (the standard procedure under uncertainty) to efficiently retrieve matches. |
id |
FIOCRUZ-5_7870850a3c1891c372352693c921e29d |
---|---|
oai_identifier_str |
oai:ojs.teste-cadernos.ensp.fiocruz.br:article/2215 |
network_acronym_str |
FIOCRUZ-5 |
network_name_str |
Cadernos de Saúde Pública |
repository_id_str |
|
spelling |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problemProbabilityRecordsCohort StudiesProbabilistic record linkage allows the assembling of information from different data sources. We present a procedure when a one-to-one relationship between records in different files is expected but not found. Data were births and infant deaths, 1998-birth cohort, city of São Paulo, Brazil. Pairs for which a one-to-one relationship was obtained and a best-link was found with the highest weight were taken as unequivocally matched pairs and provided information to decide on the remaining pairs. For these, an expected relationship between differences in dates of death and birth registration was found; and places of birth and death registration for neonatal deaths were likely to be the same. Such evidence was used to solve for the remaining pairs. We reduced the number of non-uniquely matched records and of uncertain matches, and increased the number of uniquely matched pairs from 2,249 to 2,827. Future research using record linkage should use strategies from first record linkage runs before a full clerical review (the standard procedure under uncertainty) to efficiently retrieve matches.O relacionamento probabilístico permite que fontes de informações do mesmo registro e em bancos de dados distintos sejam unificadas. Apresenta-se um procedimento utilizado quando se espera que um registro de um banco de dados corresponda a apenas um outro num segundo banco. As fontes de dados foram os nascimentos e óbitos infantis da coorte de nascimentos de 1998, na cidade de São Paulo, Brasil. Os dados relacionados com o mais alto escore e relação unívoca foram utilizados como padrão-ouro e concorreram para a decisão sobre pares obtidos sem relação unívoca. Um comportamento esperado dos dados univocamente relacionados em termos da diferença nas datas de registro de óbito e de nascimento, e também dos locais de registro de nascimento e de óbito para óbitos neonatais foi observado e, aplicou-se esta relação aos demais dados. O número de pares com relação unívoca aumentou substancialmente, de 2.249 para 2.827, e diminuiu o número de nascimentos ligados a um óbito. Este procedimento deve ser associado à revisão manual (procedimento padrão na presença de incerteza) a fim de conseguir um pareamento eficiente.Reports in Public HealthCadernos de Saúde Pública2004-08-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlapplication/pdfhttps://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215Reports in Public Health; Vol. 20 No. 4 (2004): July/AugustCadernos de Saúde Pública; v. 20 n. 4 (2004): Julho/Agosto1678-44640102-311Xreponame:Cadernos de Saúde Públicainstname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZenghttps://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215/4421https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215/4422Machado, Carla JorgeHill, Kennethinfo:eu-repo/semantics/openAccess2024-03-06T15:26:54Zoai:ojs.teste-cadernos.ensp.fiocruz.br:article/2215Revistahttps://cadernos.ensp.fiocruz.br/ojs/index.php/csphttps://cadernos.ensp.fiocruz.br/ojs/index.php/csp/oaicadernos@ensp.fiocruz.br||cadernos@ensp.fiocruz.br1678-44640102-311Xopendoar:2024-03-06T13:02:36.691550Cadernos de Saúde Pública - Fundação Oswaldo Cruz (FIOCRUZ)true |
dc.title.none.fl_str_mv |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
title |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
spellingShingle |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem Machado, Carla Jorge Probability Records Cohort Studies |
title_short |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
title_full |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
title_fullStr |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
title_full_unstemmed |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
title_sort |
Probabilistic record linkage and an automated procedure to minimize the undecided-matched pair problem |
author |
Machado, Carla Jorge |
author_facet |
Machado, Carla Jorge Hill, Kenneth |
author_role |
author |
author2 |
Hill, Kenneth |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Machado, Carla Jorge Hill, Kenneth |
dc.subject.por.fl_str_mv |
Probability Records Cohort Studies |
topic |
Probability Records Cohort Studies |
description |
Probabilistic record linkage allows the assembling of information from different data sources. We present a procedure when a one-to-one relationship between records in different files is expected but not found. Data were births and infant deaths, 1998-birth cohort, city of São Paulo, Brazil. Pairs for which a one-to-one relationship was obtained and a best-link was found with the highest weight were taken as unequivocally matched pairs and provided information to decide on the remaining pairs. For these, an expected relationship between differences in dates of death and birth registration was found; and places of birth and death registration for neonatal deaths were likely to be the same. Such evidence was used to solve for the remaining pairs. We reduced the number of non-uniquely matched records and of uncertain matches, and increased the number of uniquely matched pairs from 2,249 to 2,827. Future research using record linkage should use strategies from first record linkage runs before a full clerical review (the standard procedure under uncertainty) to efficiently retrieve matches. |
publishDate |
2004 |
dc.date.none.fl_str_mv |
2004-08-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215 |
url |
https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215/4421 https://cadernos.ensp.fiocruz.br/ojs/index.php/csp/article/view/2215/4422 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html application/pdf |
dc.publisher.none.fl_str_mv |
Reports in Public Health Cadernos de Saúde Pública |
publisher.none.fl_str_mv |
Reports in Public Health Cadernos de Saúde Pública |
dc.source.none.fl_str_mv |
Reports in Public Health; Vol. 20 No. 4 (2004): July/August Cadernos de Saúde Pública; v. 20 n. 4 (2004): Julho/Agosto 1678-4464 0102-311X reponame:Cadernos de Saúde Pública instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Cadernos de Saúde Pública |
collection |
Cadernos de Saúde Pública |
repository.name.fl_str_mv |
Cadernos de Saúde Pública - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
cadernos@ensp.fiocruz.br||cadernos@ensp.fiocruz.br |
_version_ |
1798943353758285824 |