Spatial models with random covariance structure

Detalhes bibliográficos
Autor(a) principal: Danna Lesley Cruz Reyes
Data de Publicação: 2021
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/38393
https://orcid.org/ 0000-0002-5977-8162
Resumo: The conditional autoregressive model (CAR model) is the most popular distribution for jointly modeling the a priori uncertainty over spatially correlated data. In general, it is used in hierarchical spatial models where it models the uncertainty about random spatial e ects. A limitation of the CAR model is its inability to produce high correlations between neighboring areas. We propose a robust model for area data that alleviates this problem. We represent the map by an undirected graph where nodes represent areas and edges connect neighboring nodes on the map. We assign distinct and random weights to the edges. The model is based on a spatially structured t≠Student multivariate distribution, in which the precision matrix is indirectly constructed assuming a multivariate distribution for the random weights of the edges. Such t≠ Student distribution spatially correlates the edge weights and induces another t-Student model for the spatial e ects of the areas that correlates them and is able to accommodate outliers and heavy tail behavior for these e ects . More importantly, the proposed model can produce a higher marginal correlation between spatial e ects than the CAR model, overcoming one of the main limitations of this model. We adjusted the proposed model to map the incidence of some types of cancer in southern Brazil and compared its performance with several alternative models proposed in the literature. The results show that the proposed model is competitive and provides similar and, in some cases, better results than those obtained by fitting models commonly used to analyze this type of data. In the second proposal, we approach the problem of dimensionality reduction in regression models. One of the most used methods to avoid overfitting and to select relevant variables in regression models with many predictors is the penalized regression technique. Under such approaches, variable selection is performed in a non-probabilistic way using some optimization criterion. Bayesian approaches to penalized regression have been proposed assuming an a priori distribution for the regression coe cients that plays a role similar to the penalty term in classical statistics: compressing towards zero non- significant coe cients and putting a probability mass significant in coe cients that can be grouped. Generally, such a priori distributions, called shrinkage priors (shrinkage a priori distributions), assume independence between the e ects of the covariates, which may not be an appropriate assumption in many cases. In this work, we focus on the dimensionality reduction of categorical variables with many levels. These variables are included in the model through variables dummy inducing sparsity in the design matrix, which can generate overfitting and di culties in interpreting the results. The e ect of the levels of these categorical variables are naturally correlated. To deal with this problem, we propose two a priori shrinkage distributions for the coe cients associated with the levels of categorical variables, correlating them. The proposed distributions are proper and, in addition to sparsity, they have the property of grouping similar e ects. We illustrate the use of these distributions by applying them to dimensionality reduction in a linear regression. Their performances are analyzed and compared to pre-existing methods through simulated data studies and considering housing price data available on Airbnb.
id UFMG_4d9d1883fdfa49311b0d122e85bca773
oai_identifier_str oai:repositorio.ufmg.br:1843/38393
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Rosângela Helena Loschihttp://lattes.cnpq.br/8443300958745785Renato Martins Assunçãohttp://lattes.cnpq.br/3575559872183767Vinícius Diniz MayrinkGuilherme Vieira Nunes LudwigJoão Batista de Morais PereiraGiovani Loiola da Silvahttp://lattes.cnpq.br/5747821924060340Danna Lesley Cruz Reyes2021-10-18T00:05:40Z2021-10-18T00:05:40Z2021-07-28http://hdl.handle.net/1843/38393https://orcid.org/ 0000-0002-5977-8162The conditional autoregressive model (CAR model) is the most popular distribution for jointly modeling the a priori uncertainty over spatially correlated data. In general, it is used in hierarchical spatial models where it models the uncertainty about random spatial e ects. A limitation of the CAR model is its inability to produce high correlations between neighboring areas. We propose a robust model for area data that alleviates this problem. We represent the map by an undirected graph where nodes represent areas and edges connect neighboring nodes on the map. We assign distinct and random weights to the edges. The model is based on a spatially structured t≠Student multivariate distribution, in which the precision matrix is indirectly constructed assuming a multivariate distribution for the random weights of the edges. Such t≠ Student distribution spatially correlates the edge weights and induces another t-Student model for the spatial e ects of the areas that correlates them and is able to accommodate outliers and heavy tail behavior for these e ects . More importantly, the proposed model can produce a higher marginal correlation between spatial e ects than the CAR model, overcoming one of the main limitations of this model. We adjusted the proposed model to map the incidence of some types of cancer in southern Brazil and compared its performance with several alternative models proposed in the literature. The results show that the proposed model is competitive and provides similar and, in some cases, better results than those obtained by fitting models commonly used to analyze this type of data. In the second proposal, we approach the problem of dimensionality reduction in regression models. One of the most used methods to avoid overfitting and to select relevant variables in regression models with many predictors is the penalized regression technique. Under such approaches, variable selection is performed in a non-probabilistic way using some optimization criterion. Bayesian approaches to penalized regression have been proposed assuming an a priori distribution for the regression coe cients that plays a role similar to the penalty term in classical statistics: compressing towards zero non- significant coe cients and putting a probability mass significant in coe cients that can be grouped. Generally, such a priori distributions, called shrinkage priors (shrinkage a priori distributions), assume independence between the e ects of the covariates, which may not be an appropriate assumption in many cases. In this work, we focus on the dimensionality reduction of categorical variables with many levels. These variables are included in the model through variables dummy inducing sparsity in the design matrix, which can generate overfitting and di culties in interpreting the results. The e ect of the levels of these categorical variables are naturally correlated. To deal with this problem, we propose two a priori shrinkage distributions for the coe cients associated with the levels of categorical variables, correlating them. The proposed distributions are proper and, in addition to sparsity, they have the property of grouping similar e ects. We illustrate the use of these distributions by applying them to dimensionality reduction in a linear regression. Their performances are analyzed and compared to pre-existing methods through simulated data studies and considering housing price data available on Airbnb.O modelo autorregressivo condicional (modelo CAR) é a distribuição mais popular para conjuntamente modelar a incerteza a priori sobre dados espacialmente correlacionados. Em geral, é utilizado em modelos espaciais hierárquicos onde modela a incerteza sobre os efeitos aleatórios espaciais. Uma limitação do modelo CAR é sua incapacidade de produzir correlações altas entre áreas vizinhas. Propomos um modelo robusto para dados de área que ameniza esse problema. Representamos o mapa por um grafo não direcionado onde os nós representam as áreas e as arestas conectam nós vizinhos no mapa. Atribuímos às arestas pesos distintos e aleatórios. O modelo é baseado em uma distribuição multivariada t≠ Student, espacialmente estruturada, em que a matriz de precisão é indiretamente construída assumindo-se uma distribuição multivariada para os pesos aleatórios das arestas. Tal distribuição t≠ Student correlaciona espacialmente os pesos das arestas e induz um outro modelo t-Student para o efeitos espaciais das áreas que os correlaciona e é capaz de acomodar outliers e comportamento de cauda pesada para estes efeitos. Mais importante, o modelo proposto pode produzir uma correlação marginal mais alta entre os efeitos espaciais do que o modelo CAR, superando uma das principais limitações deste modelo. Ajustamos o modelo proposto para mapear a incidência de alguns tipos câncer na região sul do Brazil e comparamos seu desempenho com vários modelos alternativos propostos na literatura. Os resultados mostram que o modelo proposto é competitivo e fornece resultados similares e, em alguns casos, melhores que os obtidos ajustando modelos comumente usados para analisar este tipo de dados. Na segunda proposta, abordamos o problema de redução de dimensionalidade em modelos de regressão. Um dos métodos mais utilizados para evitar sobreajuste e selecionar variáveis relevantes em modelos de regressão com muitos preditores é a técnica de regressão penalizada. Sob tais abordagens, a seleção de variáveis é realizada de forma não probabilística utilizando algum critério de otimização. Abordagens Bayesianas para a regressão penalizada têm sido proposta assumindo uma distribuição a priori para os coeficientes de regressão que desempenha um papel semelhante ao termo de penalidade nas estatísticas clássicas: comprimir em direção a zero coeficientes não significativos e colocar uma massa de probabilidade significativa em coeficientes que podem ser agrupados. Geralmente, tais distribuições a priori, chamadas shrinkage priors (ditribuições a priori de encolhimento), assumem independência entre os efeitos das covariáveis, o que pode não ser uma suposição apropriada em muitos casos. Neste trabalho, focamos na redução de dimensionalidade de variáveis categóricas com muitos níveis. Estas vaiáveis são incluídas no modelo através de variáveis dummy induzindo esparsidade na matrix de delineamento, o que pode gerar sobreajuste e dificuldades na interpretação dos resultados. O efeito dos níveis destas variáveis categóricas são naturalmente correlacionados. Para lidarmos com este problema, propomos duas distribuições a priori de encolhimento para os coeficientes associados aos níveis de variáveis categóricas, correlacionando-os. As distribuições propostas são próprias e, além de esparsidade, têm a propriedade de agrupar efeitos similares. Ilustrarmos o uso destas distribuições aplicando-as na redução de dimensionalidade em um regressão linear. Seus desempenhos são analisados e comparados a métodos pré-existentes por meio de estudos de dados simulados e considerando dados de preços de habitação disponíveis no Airbnb.FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em EstatísticaUFMGBrasilICX - DEPARTAMENTO DE ESTATÍSTICAEstatística - TesesAnálise espacial (Estatística) – tesesAutoregressão (Estatística) – TesesCâncer – Região Sul – Métodos estatísticos. -TesesSpatial statisticCAR modelGraph of edgesShrinkage priorRobust spatial modelsSpatial models with random covariance structureinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALDisertação_Spatial_random_structure.pdfDisertação_Spatial_random_structure.pdfapplication/pdf18056861https://repositorio.ufmg.br/bitstream/1843/38393/1/Disertac%cc%a7a%cc%83o_Spatial_random_structure.pdf1e9ccb5a327604d9d5bbfa762b4a5339MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/38393/2/license.txtcda590c95a0b51b4d15f60c9642ca272MD521843/383932021-10-17 21:05:41.23oai:repositorio.ufmg.br:1843/38393TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2021-10-18T00:05:41Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Spatial models with random covariance structure
title Spatial models with random covariance structure
spellingShingle Spatial models with random covariance structure
Danna Lesley Cruz Reyes
Spatial statistic
CAR model
Graph of edges
Shrinkage prior
Robust spatial models
Estatística - Teses
Análise espacial (Estatística) – teses
Autoregressão (Estatística) – Teses
Câncer – Região Sul – Métodos estatísticos. -Teses
title_short Spatial models with random covariance structure
title_full Spatial models with random covariance structure
title_fullStr Spatial models with random covariance structure
title_full_unstemmed Spatial models with random covariance structure
title_sort Spatial models with random covariance structure
author Danna Lesley Cruz Reyes
author_facet Danna Lesley Cruz Reyes
author_role author
dc.contributor.advisor1.fl_str_mv Rosângela Helena Loschi
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/8443300958745785
dc.contributor.advisor2.fl_str_mv Renato Martins Assunção
dc.contributor.advisor2Lattes.fl_str_mv http://lattes.cnpq.br/3575559872183767
dc.contributor.referee1.fl_str_mv Vinícius Diniz Mayrink
dc.contributor.referee2.fl_str_mv Guilherme Vieira Nunes Ludwig
dc.contributor.referee3.fl_str_mv João Batista de Morais Pereira
dc.contributor.referee4.fl_str_mv Giovani Loiola da Silva
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/5747821924060340
dc.contributor.author.fl_str_mv Danna Lesley Cruz Reyes
contributor_str_mv Rosângela Helena Loschi
Renato Martins Assunção
Vinícius Diniz Mayrink
Guilherme Vieira Nunes Ludwig
João Batista de Morais Pereira
Giovani Loiola da Silva
dc.subject.por.fl_str_mv Spatial statistic
CAR model
Graph of edges
Shrinkage prior
Robust spatial models
topic Spatial statistic
CAR model
Graph of edges
Shrinkage prior
Robust spatial models
Estatística - Teses
Análise espacial (Estatística) – teses
Autoregressão (Estatística) – Teses
Câncer – Região Sul – Métodos estatísticos. -Teses
dc.subject.other.pt_BR.fl_str_mv Estatística - Teses
Análise espacial (Estatística) – teses
Autoregressão (Estatística) – Teses
Câncer – Região Sul – Métodos estatísticos. -Teses
description The conditional autoregressive model (CAR model) is the most popular distribution for jointly modeling the a priori uncertainty over spatially correlated data. In general, it is used in hierarchical spatial models where it models the uncertainty about random spatial e ects. A limitation of the CAR model is its inability to produce high correlations between neighboring areas. We propose a robust model for area data that alleviates this problem. We represent the map by an undirected graph where nodes represent areas and edges connect neighboring nodes on the map. We assign distinct and random weights to the edges. The model is based on a spatially structured t≠Student multivariate distribution, in which the precision matrix is indirectly constructed assuming a multivariate distribution for the random weights of the edges. Such t≠ Student distribution spatially correlates the edge weights and induces another t-Student model for the spatial e ects of the areas that correlates them and is able to accommodate outliers and heavy tail behavior for these e ects . More importantly, the proposed model can produce a higher marginal correlation between spatial e ects than the CAR model, overcoming one of the main limitations of this model. We adjusted the proposed model to map the incidence of some types of cancer in southern Brazil and compared its performance with several alternative models proposed in the literature. The results show that the proposed model is competitive and provides similar and, in some cases, better results than those obtained by fitting models commonly used to analyze this type of data. In the second proposal, we approach the problem of dimensionality reduction in regression models. One of the most used methods to avoid overfitting and to select relevant variables in regression models with many predictors is the penalized regression technique. Under such approaches, variable selection is performed in a non-probabilistic way using some optimization criterion. Bayesian approaches to penalized regression have been proposed assuming an a priori distribution for the regression coe cients that plays a role similar to the penalty term in classical statistics: compressing towards zero non- significant coe cients and putting a probability mass significant in coe cients that can be grouped. Generally, such a priori distributions, called shrinkage priors (shrinkage a priori distributions), assume independence between the e ects of the covariates, which may not be an appropriate assumption in many cases. In this work, we focus on the dimensionality reduction of categorical variables with many levels. These variables are included in the model through variables dummy inducing sparsity in the design matrix, which can generate overfitting and di culties in interpreting the results. The e ect of the levels of these categorical variables are naturally correlated. To deal with this problem, we propose two a priori shrinkage distributions for the coe cients associated with the levels of categorical variables, correlating them. The proposed distributions are proper and, in addition to sparsity, they have the property of grouping similar e ects. We illustrate the use of these distributions by applying them to dimensionality reduction in a linear regression. Their performances are analyzed and compared to pre-existing methods through simulated data studies and considering housing price data available on Airbnb.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-10-18T00:05:40Z
dc.date.available.fl_str_mv 2021-10-18T00:05:40Z
dc.date.issued.fl_str_mv 2021-07-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/38393
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/ 0000-0002-5977-8162
url http://hdl.handle.net/1843/38393
https://orcid.org/ 0000-0002-5977-8162
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Estatística
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE ESTATÍSTICA
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/38393/1/Disertac%cc%a7a%cc%83o_Spatial_random_structure.pdf
https://repositorio.ufmg.br/bitstream/1843/38393/2/license.txt
bitstream.checksum.fl_str_mv 1e9ccb5a327604d9d5bbfa762b4a5339
cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1803589400459739136