Bias correction in clustered underreported data

Guilherme Lopes de Oliveira; Raffaele Argiento; Rosangela Helena Loschi; Renato Martins Assunção; Fabrizio Ruggeri; Márcia D’Elia Branco

Bias correction in clustered underreported data

Detalhes bibliográficos
Autor(a) principal:	Guilherme Lopes de Oliveira
Data de Publicação:	2022
Outros Autores:	Raffaele Argiento, Rosangela Helena Loschi, Renato Martins Assunção, Fabrizio Ruggeri, Márcia D’Elia Branco
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UFMG
Texto Completo:	https://doi.org/10.1214/20-BA1244 http://hdl.handle.net/1843/56438 https://orcid.org/0000-0003-3220-6356 https://orcid.org/0000-0001-6554-9799 https://orcid.org/0000-0002-7655-6254 https://orcid.org/0000-0002-6724-9367
Resumo:	Data quality from poor and socially deprived regions have given rise to many statistical challenges. One of them is the underreporting of vital events leading to biased estimates for the associated risks. To deal with underreported count data, models based on compound Poisson distributions have been commonly assumed. To be identifiable, such models usually require extra and strong information about the probability of reporting the event in all areas of interest, which is not always available. We introduce a novel approach for the compound Poisson model assuming that the areas are clustered according to their data quality. We leverage these clusters to create a hierarchical structure in which the reporting probabilities decrease as we move from the best group to the worst ones. We obtain constraints for model identifiability and prove that only prior information about the reporting probability in areas experiencing the best data quality is required. Several approaches to model the uncertainty about the reporting probabilities are presented, including reference priors. Different features regarding the proposed methodology are studied through simulation. We apply our model to map the early neonatal mortality risks in Minas Gerais, a Brazilian state that presents heterogeneous characteristics and a relevant socio-economical inequality.

Metadados do item

id	UFMG_110f57374ed777b7ca1bbb9ef3adee2d
oai_identifier_str	oai:repositorio.ufmg.br:1843/56438
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	2023-07-17T18:51:07Z2023-07-17T18:51:07Z2022-0317195126https://doi.org/10.1214/20-BA12441931-6690http://hdl.handle.net/1843/56438https://orcid.org/0000-0003-3220-6356https://orcid.org/0000-0001-6554-9799https://orcid.org/0000-0002-7655-6254https://orcid.org/0000-0002-6724-9367Data quality from poor and socially deprived regions have given rise to many statistical challenges. One of them is the underreporting of vital events leading to biased estimates for the associated risks. To deal with underreported count data, models based on compound Poisson distributions have been commonly assumed. To be identifiable, such models usually require extra and strong information about the probability of reporting the event in all areas of interest, which is not always available. We introduce a novel approach for the compound Poisson model assuming that the areas are clustered according to their data quality. We leverage these clusters to create a hierarchical structure in which the reporting probabilities decrease as we move from the best group to the worst ones. We obtain constraints for model identifiability and prove that only prior information about the reporting probability in areas experiencing the best data quality is required. Several approaches to model the uncertainty about the reporting probabilities are presented, including reference priors. Different features regarding the proposed methodology are studied through simulation. We apply our model to map the early neonatal mortality risks in Minas Gerais, a Brazilian state that presents heterogeneous characteristics and a relevant socio-economical inequality.A qualidade dos dados de regiões pobres e socialmente carentes deu origem a muitos desafios estatísticos. Uma delas é a subnotificação de eventos vitais levando a estimativas enviesadas dos riscos associados. Para lidar com dados de contagem subnotificados, modelos baseados em distribuições compostas de Poisson têm sido comumente assumidos. Para serem identificáveis, tais modelos geralmente requerem informações extras e fortes sobre a probabilidade de relatar o evento em todas as áreas de interesse, o que nem sempre está disponível. Introduzimos uma nova abordagem para o modelo composto de Poisson assumindo que as áreas são agrupadas de acordo com a qualidade de seus dados. Aproveitamos esses clusters para criar uma estrutura hierárquica na qual as probabilidades de relatórios diminuem à medida que passamos do melhor grupo para o pior. Obtemos restrições para a identificabilidade do modelo e provamos que apenas informações prévias sobre a probabilidade de relatórios em áreas com a melhor qualidade de dados são necessárias. Várias abordagens para modelar a incerteza sobre as probabilidades de relatórios são apresentadas, incluindo prioris de referência. Diferentes características da metodologia proposta são estudadas através de simulação. Aplicamos nosso modelo para mapear os riscos de mortalidade neonatal precoce em Minas Gerais, um estado brasileiro que apresenta características heterogêneas e uma desigualdade socioeconômica relevante.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorOutra AgênciaengUniversidade Federal de Minas GeraisUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOICX - DEPARTAMENTO DE ESTATÍSTICABayesian AnalysisEstatísticaDistribuição de PoissonDistribuição (Probabilidades)Mortalidade infantilCompound Poisson modelGeneralized beta distributionJeffreys priorModel identifiabilityNeonatal mortalityUnderreportingBias correction in clustered underreported dataCorreção de viés em dados subnotificados agrupadosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://projecteuclid.org/journals/bayesian-analysis/volume-17/issue-1/Bias-Correction-in-Clustered-Underreported-Data/10.1214/20-BA1244.fullGuilherme Lopes de OliveiraRaffaele ArgientoRosangela Helena LoschiRenato Martins AssunçãoFabrizio RuggeriMárcia D’Elia Brancoapplication/pdfinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGLICENSELicense.txtLicense.txttext/plain; charset=utf-82042https://repositorio.ufmg.br/bitstream/1843/56438/1/License.txtfa505098d172de0bc8864fc1287ffe22MD51ORIGINALBias correction in clustered underreported data.pdfBias correction in clustered underreported data.pdfapplication/pdf4528433https://repositorio.ufmg.br/bitstream/1843/56438/2/Bias%20correction%20in%20clustered%20underreported%20data.pdf060f432411f6707e264e711ba742a888MD521843/564382023-07-17 15:51:07.661oai:repositorio.ufmg.br:1843/56438TElDRU7vv71BIERFIERJU1RSSUJVSe+/ve+/vU8gTu+/vU8tRVhDTFVTSVZBIERPIFJFUE9TSVTvv71SSU8gSU5TVElUVUNJT05BTCBEQSBVRk1HCiAKCkNvbSBhIGFwcmVzZW50Ye+/ve+/vW8gZGVzdGEgbGljZW7vv71hLCB2b2Pvv70gKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIGFvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbu+/vW8gZXhjbHVzaXZvIGUgaXJyZXZvZ++/vXZlbCBkZSByZXByb2R1emlyIGUvb3UgZGlzdHJpYnVpciBhIHN1YSBwdWJsaWNh77+977+9byAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0cu+/vW5pY28gZSBlbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mg77+9dWRpbyBvdSB277+9ZGVvLgoKVm9j77+9IGRlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zvv710aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2Pvv70gY29uY29yZGEgcXVlIG8gUmVwb3NpdO+/vXJpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250Ze+/vWRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNh77+977+9byBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHvv73vv71vLgoKVm9j77+9IHRhbWLvv71tIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPvv71waWEgZGUgc3VhIHB1YmxpY2Hvv73vv71vIHBhcmEgZmlucyBkZSBzZWd1cmFu77+9YSwgYmFjay11cCBlIHByZXNlcnZh77+977+9by4KClZvY++/vSBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNh77+977+9byDvv70gb3JpZ2luYWwgZSBxdWUgdm9j77+9IHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vu77+9YS4gVm9j77+9IHRhbWLvv71tIGRlY2xhcmEgcXVlIG8gZGVw77+9c2l0byBkZSBzdWEgcHVibGljYe+/ve+/vW8gbu+/vW8sIHF1ZSBzZWphIGRlIHNldSBjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd177+9bS4KCkNhc28gYSBzdWEgcHVibGljYe+/ve+/vW8gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY++/vSBu77+9byBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2Pvv70gZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc++/vW8gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciBhbyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7vv71hLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3Tvv70gY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250Ze+/vWRvIGRhIHB1YmxpY2Hvv73vv71vIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFBVQkxJQ0Hvv73vv71PIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ++/vU5JTyBPVSBBUE9JTyBERSBVTUEgQUfvv71OQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0Pvv70gREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklT77+9TyBDT01PIFRBTULvv71NIEFTIERFTUFJUyBPQlJJR0Hvv73vv71FUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNh77+977+9bywgZSBu77+9byBmYXLvv70gcXVhbHF1ZXIgYWx0ZXJh77+977+9bywgYWzvv71tIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7vv71hLgo=Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2023-07-17T18:51:07Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv	Bias correction in clustered underreported data
dc.title.alternative.pt_BR.fl_str_mv	Correção de viés em dados subnotificados agrupados
title	Bias correction in clustered underreported data
spellingShingle	Bias correction in clustered underreported data Guilherme Lopes de Oliveira Compound Poisson model Generalized beta distribution Jeffreys prior Model identifiability Neonatal mortality Underreporting Estatística Distribuição de Poisson Distribuição (Probabilidades) Mortalidade infantil
title_short	Bias correction in clustered underreported data
title_full	Bias correction in clustered underreported data
title_fullStr	Bias correction in clustered underreported data
title_full_unstemmed	Bias correction in clustered underreported data
title_sort	Bias correction in clustered underreported data
author	Guilherme Lopes de Oliveira
author_facet	Guilherme Lopes de Oliveira Raffaele Argiento Rosangela Helena Loschi Renato Martins Assunção Fabrizio Ruggeri Márcia D’Elia Branco
author_role	author
author2	Raffaele Argiento Rosangela Helena Loschi Renato Martins Assunção Fabrizio Ruggeri Márcia D’Elia Branco
author2_role	author author author author author
dc.contributor.author.fl_str_mv	Guilherme Lopes de Oliveira Raffaele Argiento Rosangela Helena Loschi Renato Martins Assunção Fabrizio Ruggeri Márcia D’Elia Branco
dc.subject.por.fl_str_mv	Compound Poisson model Generalized beta distribution Jeffreys prior Model identifiability Neonatal mortality Underreporting
topic	Compound Poisson model Generalized beta distribution Jeffreys prior Model identifiability Neonatal mortality Underreporting Estatística Distribuição de Poisson Distribuição (Probabilidades) Mortalidade infantil
dc.subject.other.pt_BR.fl_str_mv	Estatística Distribuição de Poisson Distribuição (Probabilidades) Mortalidade infantil
description	Data quality from poor and socially deprived regions have given rise to many statistical challenges. One of them is the underreporting of vital events leading to biased estimates for the associated risks. To deal with underreported count data, models based on compound Poisson distributions have been commonly assumed. To be identifiable, such models usually require extra and strong information about the probability of reporting the event in all areas of interest, which is not always available. We introduce a novel approach for the compound Poisson model assuming that the areas are clustered according to their data quality. We leverage these clusters to create a hierarchical structure in which the reporting probabilities decrease as we move from the best group to the worst ones. We obtain constraints for model identifiability and prove that only prior information about the reporting probability in areas experiencing the best data quality is required. Several approaches to model the uncertainty about the reporting probabilities are presented, including reference priors. Different features regarding the proposed methodology are studied through simulation. We apply our model to map the early neonatal mortality risks in Minas Gerais, a Brazilian state that presents heterogeneous characteristics and a relevant socio-economical inequality.
publishDate	2022
dc.date.issued.fl_str_mv	2022-03
dc.date.accessioned.fl_str_mv	2023-07-17T18:51:07Z
dc.date.available.fl_str_mv	2023-07-17T18:51:07Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1843/56438
dc.identifier.doi.pt_BR.fl_str_mv	https://doi.org/10.1214/20-BA1244
dc.identifier.issn.pt_BR.fl_str_mv	1931-6690
dc.identifier.orcid.pt_BR.fl_str_mv	https://orcid.org/0000-0003-3220-6356 https://orcid.org/0000-0001-6554-9799 https://orcid.org/0000-0002-7655-6254 https://orcid.org/0000-0002-6724-9367
url	https://doi.org/10.1214/20-BA1244 http://hdl.handle.net/1843/56438 https://orcid.org/0000-0003-3220-6356 https://orcid.org/0000-0001-6554-9799 https://orcid.org/0000-0002-7655-6254 https://orcid.org/0000-0002-6724-9367
identifier_str_mv	1931-6690
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.ispartof.pt_BR.fl_str_mv	Bayesian Analysis
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.publisher.initials.fl_str_mv	UFMG
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO ICX - DEPARTAMENTO DE ESTATÍSTICA
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
bitstream.url.fl_str_mv	https://repositorio.ufmg.br/bitstream/1843/56438/1/License.txt https://repositorio.ufmg.br/bitstream/1843/56438/2/Bias%20correction%20in%20clustered%20underreported%20data.pdf
bitstream.checksum.fl_str_mv	fa505098d172de0bc8864fc1287ffe22 060f432411f6707e264e711ba742a888
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_	1803589373273309184

Bias correction in clustered underreported data

Registros relacionados