Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules

Detalhes bibliográficos
Autor(a) principal: Guilherme Lopes de Oliveira
Data de Publicação: 2020
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/37509
https://orcid.org/0000-0003-3220-6356
Resumo: In several fields, such as epidemiology and demography, count data is collected in order to assess or to monitor the risks associated with the events of interest. However, in many situations only a fraction of the true total of events is observed, characterizing the phenomenon known as underreporting, which is very common in epidemiological studies. If the underreporting occurs and it is not accounted for, the inference made from the observed counts will be biased and, consequently, the risks related to the events of interest will be underestimated. In addition to the issue of underreporting, in some studies the observed counts may be highly sparse, as usually occurs in the analysis of mortality patterns in demographic studies. In this dissertation, we address these challenging problems commonly faced when analyzing count data. Among the proposed models, there are two approaches for the correction of underreporting bias, which have been published in relevant journals in statistics, as well as an alternative methodology for estimating and smoothing mortality curves by age and sex in the presence of sparse data, which is been improved. A broader introduction to the practical problems addressed in the dissertation is provided in the opening chapter, which also provides a detailed description of the contributions related to each proposed model. The subsequent chapters corresponds to a collection of papers, which present independent methodologies with individual discussions of the problems addressed. In all cases, the inference process is made under the Bayesian paradigm. Some approaches available in the statistical literature are discussed and, in some cases, used for comparison with the proposed models. Simulated data as well as real datasets are used to explore and to illustrate the main features of the models. The final chapter summarizes the methods and results obtained throughout the dissertation, highlighting some interesting points for future research.
id UFMG_33d68f86051b945dde5f4713e1b368a5
oai_identifier_str oai:repositorio.ufmg.br:1843/37509
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Rosangela Helena Loschihttp://lattes.cnpq.br/8443300958745785Renato Martins AssunçãoFlávio Bambirra GonçalvesLeonardo Soares BastosThais Cristina Oliveira da FonsecaWagner Barreto de Souzahttp://lattes.cnpq.br/2909498413150072Guilherme Lopes de Oliveira2021-08-16T15:12:10Z2021-08-16T15:12:10Z2020-11-03http://hdl.handle.net/1843/37509https://orcid.org/0000-0003-3220-6356In several fields, such as epidemiology and demography, count data is collected in order to assess or to monitor the risks associated with the events of interest. However, in many situations only a fraction of the true total of events is observed, characterizing the phenomenon known as underreporting, which is very common in epidemiological studies. If the underreporting occurs and it is not accounted for, the inference made from the observed counts will be biased and, consequently, the risks related to the events of interest will be underestimated. In addition to the issue of underreporting, in some studies the observed counts may be highly sparse, as usually occurs in the analysis of mortality patterns in demographic studies. In this dissertation, we address these challenging problems commonly faced when analyzing count data. Among the proposed models, there are two approaches for the correction of underreporting bias, which have been published in relevant journals in statistics, as well as an alternative methodology for estimating and smoothing mortality curves by age and sex in the presence of sparse data, which is been improved. A broader introduction to the practical problems addressed in the dissertation is provided in the opening chapter, which also provides a detailed description of the contributions related to each proposed model. The subsequent chapters corresponds to a collection of papers, which present independent methodologies with individual discussions of the problems addressed. In all cases, the inference process is made under the Bayesian paradigm. Some approaches available in the statistical literature are discussed and, in some cases, used for comparison with the proposed models. Simulated data as well as real datasets are used to explore and to illustrate the main features of the models. The final chapter summarizes the methods and results obtained throughout the dissertation, highlighting some interesting points for future research.Em diversas áreas do conhecimento como, por exemplo, Epidemiologia e Demografia, dados de contagem são coletados com o intuito de avaliar ou monitorar os riscos associados aos eventos de interesse. No entanto, muitas vezes esses dados não são completamente registrados. Em vez disso, apenas uma fração do verdadeiro total de eventos é observada, caracterizando o fenômeno conhecido por subnotificação, muito comum em estudos epidemiológicos. Se a subnotificação ocorre e não é levada em consideração, as inferências feitas a partir das contagens observadas serão viesadas e, consequentemente, os riscos relacionados aos eventos de interesse serão subestimados. Além da questão da subnotificação, dados de contagem podem apresentar alta esparcidade, como geralmente ocorre em estudos demográficos a respeito dos padrões de mortalidade em populações humanas. Nesta tese, nós abordamos estes problemas desafiadores comumente presentes na análise estatística baseada em dados de contagem. Dentre os modelos propostos, tem-se duas abordagens para a correção do viés de subnotificação, as quais foram publicadas em periódicos relevantes em Estatística, além de uma metodologia alternativa para a estimação e suavização de curvas de mortalidade por idade e sexo na presença de dados esparsos, a qual está em estágio de aprimoramento. Um introdução mais aprofundada sobre os problemas práticos abordados é fornecida no capítulo inicial, o qual também traz uma descrição detalhada das contribuições em cada modelo proposto. Os capítulos sequentes são apresentados no formato de coleção de artigos, os quais apresentam metodologias independentes com discussões individuais dos problemas abordados. Em todos os casos, o processo de inferência é feito sob o paradigma Bayesiano. Algumas abordagens disponíveis na literatura são discutidas e, em certos casos, utilizadas para comparação com os modelos propostos. Dados simulados e conjuntos de dados reais são utilizados para explorar e ilustrar as principais características dos modelos. O capítulo final traz um resumo compacto dos métodos e resultados obtidos nos estudos desenvolvidos ao longo da tese, destacando alguns pontos interessantes para estudos futuros.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em EstatísticaUFMGBrasilICX - DEPARTAMENTO DE ESTATÍSTICAEstatística – Teses.Método de Monte Carlo – Teses.Bioestatística – Teses.Teoria bayesiana de decisão estatística – Teses.Poisson, Distribuição de – Teses.Recém-nascidos – Mortalidade – TesesBayesian inferenceCensored Poisson modelCompound Poisson modelData augmentationMarkov chain Monte Carlo methodsModel identifiabilityMortality schedulesNeonatal mortalitytuberculosis incidenceUnderreportingChallenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedulesDesafios na modelagem de dados de contagem: modelos Bayesianos para correção de viés de subnotificação e estimação de curvas de mortalidadeDesafíos en el modelado de datos de recuento: modelos Bayesianos para la corrección del sesgo de subregistro y la estimación de curvas de mortalidadDéfis de la modélisation des données de comptage: modèles Bayésiens pour la correction du biais de sous-déclaration et l'estimation des courbes de mortalitéinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINAL_Tese_GuilhermeOliveira_FinalVersionBiblioteca.pdf_Tese_GuilhermeOliveira_FinalVersionBiblioteca.pdfapplication/pdf35939242https://repositorio.ufmg.br/bitstream/1843/37509/1/_Tese_GuilhermeOliveira_FinalVersionBiblioteca.pdf523e863802eaaa3a9de376ca30ad9c0aMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82119https://repositorio.ufmg.br/bitstream/1843/37509/2/license.txt34badce4be7e31e3adb4575ae96af679MD521843/375092021-08-16 12:12:10.391oai:repositorio.ufmg.br:1843/37509TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KCg==Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2021-08-16T15:12:10Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
dc.title.alternative.pt_BR.fl_str_mv Desafios na modelagem de dados de contagem: modelos Bayesianos para correção de viés de subnotificação e estimação de curvas de mortalidade
Desafíos en el modelado de datos de recuento: modelos Bayesianos para la corrección del sesgo de subregistro y la estimación de curvas de mortalidad
Défis de la modélisation des données de comptage: modèles Bayésiens pour la correction du biais de sous-déclaration et l'estimation des courbes de mortalité
title Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
spellingShingle Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
Guilherme Lopes de Oliveira
Bayesian inference
Censored Poisson model
Compound Poisson model
Data augmentation
Markov chain Monte Carlo methods
Model identifiability
Mortality schedules
Neonatal mortality
tuberculosis incidence
Underreporting
Estatística – Teses.
Método de Monte Carlo – Teses.
Bioestatística – Teses.
Teoria bayesiana de decisão estatística – Teses.
Poisson, Distribuição de – Teses.
Recém-nascidos – Mortalidade – Teses
title_short Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
title_full Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
title_fullStr Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
title_full_unstemmed Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
title_sort Challenges in modeling count data: Bayesian models for correction of underreporting bias and estimation of mortality schedules
author Guilherme Lopes de Oliveira
author_facet Guilherme Lopes de Oliveira
author_role author
dc.contributor.advisor1.fl_str_mv Rosangela Helena Loschi
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/8443300958745785
dc.contributor.advisor-co1.fl_str_mv Renato Martins Assunção
dc.contributor.referee1.fl_str_mv Flávio Bambirra Gonçalves
dc.contributor.referee2.fl_str_mv Leonardo Soares Bastos
dc.contributor.referee3.fl_str_mv Thais Cristina Oliveira da Fonseca
dc.contributor.referee4.fl_str_mv Wagner Barreto de Souza
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/2909498413150072
dc.contributor.author.fl_str_mv Guilherme Lopes de Oliveira
contributor_str_mv Rosangela Helena Loschi
Renato Martins Assunção
Flávio Bambirra Gonçalves
Leonardo Soares Bastos
Thais Cristina Oliveira da Fonseca
Wagner Barreto de Souza
dc.subject.por.fl_str_mv Bayesian inference
Censored Poisson model
Compound Poisson model
Data augmentation
Markov chain Monte Carlo methods
Model identifiability
Mortality schedules
Neonatal mortality
tuberculosis incidence
Underreporting
topic Bayesian inference
Censored Poisson model
Compound Poisson model
Data augmentation
Markov chain Monte Carlo methods
Model identifiability
Mortality schedules
Neonatal mortality
tuberculosis incidence
Underreporting
Estatística – Teses.
Método de Monte Carlo – Teses.
Bioestatística – Teses.
Teoria bayesiana de decisão estatística – Teses.
Poisson, Distribuição de – Teses.
Recém-nascidos – Mortalidade – Teses
dc.subject.other.pt_BR.fl_str_mv Estatística – Teses.
Método de Monte Carlo – Teses.
Bioestatística – Teses.
Teoria bayesiana de decisão estatística – Teses.
Poisson, Distribuição de – Teses.
Recém-nascidos – Mortalidade – Teses
description In several fields, such as epidemiology and demography, count data is collected in order to assess or to monitor the risks associated with the events of interest. However, in many situations only a fraction of the true total of events is observed, characterizing the phenomenon known as underreporting, which is very common in epidemiological studies. If the underreporting occurs and it is not accounted for, the inference made from the observed counts will be biased and, consequently, the risks related to the events of interest will be underestimated. In addition to the issue of underreporting, in some studies the observed counts may be highly sparse, as usually occurs in the analysis of mortality patterns in demographic studies. In this dissertation, we address these challenging problems commonly faced when analyzing count data. Among the proposed models, there are two approaches for the correction of underreporting bias, which have been published in relevant journals in statistics, as well as an alternative methodology for estimating and smoothing mortality curves by age and sex in the presence of sparse data, which is been improved. A broader introduction to the practical problems addressed in the dissertation is provided in the opening chapter, which also provides a detailed description of the contributions related to each proposed model. The subsequent chapters corresponds to a collection of papers, which present independent methodologies with individual discussions of the problems addressed. In all cases, the inference process is made under the Bayesian paradigm. Some approaches available in the statistical literature are discussed and, in some cases, used for comparison with the proposed models. Simulated data as well as real datasets are used to explore and to illustrate the main features of the models. The final chapter summarizes the methods and results obtained throughout the dissertation, highlighting some interesting points for future research.
publishDate 2020
dc.date.issued.fl_str_mv 2020-11-03
dc.date.accessioned.fl_str_mv 2021-08-16T15:12:10Z
dc.date.available.fl_str_mv 2021-08-16T15:12:10Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/37509
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0003-3220-6356
url http://hdl.handle.net/1843/37509
https://orcid.org/0000-0003-3220-6356
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Estatística
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE ESTATÍSTICA
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/37509/1/_Tese_GuilhermeOliveira_FinalVersionBiblioteca.pdf
https://repositorio.ufmg.br/bitstream/1843/37509/2/license.txt
bitstream.checksum.fl_str_mv 523e863802eaaa3a9de376ca30ad9c0a
34badce4be7e31e3adb4575ae96af679
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1803589421006585856