Residuals and diagnostic methods in models for polytomous data

Detalhes bibliográficos
Autor(a) principal: Araripe, Patricia Peres
Data de Publicação: 2022
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/11/11134/tde-14092022-153054/
Resumo: Experiments and observational studies that result in polytomous data, nominal or ordinal, are frequently conducted in different areas of knowledge, especially in the agricultural or biological sciences. The generalized logit model is the alternative used for the analysis of this type of data and based on it, conclusions and decision-making are obtained. In statistical inference, it is very important to validate a model that has been fitted to the data using diagnostic methods based on appropriate residuals. However, residual analysis and diagnostics for models associated with polytomous response are still emerging in scientific research, constituting an object of research in the area of Statistics. As the polytomous categorical variable is multivariate, Pearsons ordinary residuals and deviance are vectors per individual with unknown distribution, which creates challenges in graphical visualization and interpretation. Randomized quantile residuals can be used to circumvent problems. However, it is observed that there is a lack of an investigation of its performance for the polytomous regression through simulation studies. As an alternative to reduce the dimension of the residuals and study outliers, this work proposes to use Euclidean and Mahalanobis distance measures, since there are no records of their use for the multinomial case. In this context, the methodological contributions of this work are: review of existing residuals for the class of models associated with polytomous data; study of the normality of randomized quantile residuals; proposition of using Euclidean and Mahalanobis distances to reduce the dimension of ordinary residuals, thus constituting a procedure for the diagnosis of generalized logit models, allowing the identification of the presence of outliers. Two applications illustrate the utility of the randomized quantile residuals and distance measurements. The performance of the proposed methods was done through simulation studies. In these studies, we evaluated the performance of randomized quantile residuals for individual nominal data as well as the use of Euclidean and Mahalanobis distances for grouped data. Graphic techniques such as the half-normal plot were used to assess the model and the Shapiro-Wilk test were used to verify normality of residuals. Under different scenarios, simulation studies have shown that the approaches are relevant to assess the goodness of fit of the generalized logits model to the data. Additionally, it is noted that such studies are just the beginning of a research area with many gaps to be filled.
id USP_841ef42c71ce366ea6271ebeeefe17cb
oai_identifier_str oai:teses.usp.br:tde-14092022-153054
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Residuals and diagnostic methods in models for polytomous dataResíduos e métodos de diagnósticos em modelos para dados politômicosDistancesDistânciasGeneralized logit modelsGráfico meio-normal de probabilidadeHalf-normal plotModelo dos logitos generalizadosRandomized quantile residualResíduo quantílico aleatorizadoExperiments and observational studies that result in polytomous data, nominal or ordinal, are frequently conducted in different areas of knowledge, especially in the agricultural or biological sciences. The generalized logit model is the alternative used for the analysis of this type of data and based on it, conclusions and decision-making are obtained. In statistical inference, it is very important to validate a model that has been fitted to the data using diagnostic methods based on appropriate residuals. However, residual analysis and diagnostics for models associated with polytomous response are still emerging in scientific research, constituting an object of research in the area of Statistics. As the polytomous categorical variable is multivariate, Pearsons ordinary residuals and deviance are vectors per individual with unknown distribution, which creates challenges in graphical visualization and interpretation. Randomized quantile residuals can be used to circumvent problems. However, it is observed that there is a lack of an investigation of its performance for the polytomous regression through simulation studies. As an alternative to reduce the dimension of the residuals and study outliers, this work proposes to use Euclidean and Mahalanobis distance measures, since there are no records of their use for the multinomial case. In this context, the methodological contributions of this work are: review of existing residuals for the class of models associated with polytomous data; study of the normality of randomized quantile residuals; proposition of using Euclidean and Mahalanobis distances to reduce the dimension of ordinary residuals, thus constituting a procedure for the diagnosis of generalized logit models, allowing the identification of the presence of outliers. Two applications illustrate the utility of the randomized quantile residuals and distance measurements. The performance of the proposed methods was done through simulation studies. In these studies, we evaluated the performance of randomized quantile residuals for individual nominal data as well as the use of Euclidean and Mahalanobis distances for grouped data. Graphic techniques such as the half-normal plot were used to assess the model and the Shapiro-Wilk test were used to verify normality of residuals. Under different scenarios, simulation studies have shown that the approaches are relevant to assess the goodness of fit of the generalized logits model to the data. Additionally, it is noted that such studies are just the beginning of a research area with many gaps to be filled.Experimentos e estudos observacionais que resultam em dados politômicos nominais ou ordinais são conduzidos com frequência em diversas áreas de conhecimento, em especial nas ciências agrárias ou biológicas. O modelo dos logitos generalizados é a alternativa empregada para a análise desse tipo de dados e com base nele obtidas as conclusões e tomadas de decisão. Na inferência estatística, é muito importante validar um modelo que foi ajustado aos dados por meio de métodos de diagnósticos com base em resíduos adequados. No entanto, a análise de resíduos e diagnósticos para modelos associados aos dados politômicos ainda são emergentes na pesquisa científica, constituindo-se em objeto de pesquisa na área de Estatística. Como a variável categórica politômica é multivariada, os resíduos ordinários de Pearson e deviance são vetores por indivíduo com distribuição desconhecida, o que gera desafios na visualização e interpretação gráfica. O resíduo quantílico aleatorizado pode ser utilizado para contornar os problemas com esses resíduos. Entretanto, observa-se que falta uma investigação da sua performance para a regressão politômica por meio de estudos de simulação. Como uma alternativa para reduzir a dimensão dos resíduos e estudar outliers este trabalho propõe empregar as medidas de distâncias Euclidiana e de Mahalanobis, uma vez que não se tem registros de sua utilização para o caso multinomial. Nesse contexto, as contribuições metodológicas desse trabalho são: revisão de resíduos existentes para a classe de modelos associdados aos dados politômicos; estudo da normalidade dos resíduos quantílicos aleatorizados; proposição do uso das distâncias Euclidiana e de Mahalanobis para reduzir a dimensão dos resíduos ordinários, constituindo-se assim em um procedimento para o diagnóstico dos modelos dos logitos generalizados, permitindo identificar a presença de outliers. Duas aplicações ilustram a utilidade do resíduo quantílico aleatorizado e das medidas de distância. A performance dos métodos propostos foram feitas por meio de estudos de simulação. Nesses estudos, avaliou-se o desempenho dos resíduos quantílicos aleatorizados para os dados nominais individuais bem como o uso das distâncias Euclidiana e de Mahalanobis para dados agrupados. Foram empregadas técnicas gráficas como o gráfico meio-normal e o teste Shapiro-Wilk para avaliação da normalidade. Sob diferentes cenários, os estudos de simulação demonstraram que as abordagens são pertinentes para avaliar a bondade do ajuste do modelo dos logitos generalizados aos dados. Adicionalmente, registra-se que tais estudos são apenas o princípio para uma área de pesquisa com muitas lacunas a serem preenchidas.Biblioteca Digitais de Teses e Dissertações da USPLara, Idemauro Antonio Rodrigues deAraripe, Patricia Peres2022-07-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/11/11134/tde-14092022-153054/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2022-09-15T17:59:00Zoai:teses.usp.br:tde-14092022-153054Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212022-09-15T17:59Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Residuals and diagnostic methods in models for polytomous data
Resíduos e métodos de diagnósticos em modelos para dados politômicos
title Residuals and diagnostic methods in models for polytomous data
spellingShingle Residuals and diagnostic methods in models for polytomous data
Araripe, Patricia Peres
Distances
Distâncias
Generalized logit models
Gráfico meio-normal de probabilidade
Half-normal plot
Modelo dos logitos generalizados
Randomized quantile residual
Resíduo quantílico aleatorizado
title_short Residuals and diagnostic methods in models for polytomous data
title_full Residuals and diagnostic methods in models for polytomous data
title_fullStr Residuals and diagnostic methods in models for polytomous data
title_full_unstemmed Residuals and diagnostic methods in models for polytomous data
title_sort Residuals and diagnostic methods in models for polytomous data
author Araripe, Patricia Peres
author_facet Araripe, Patricia Peres
author_role author
dc.contributor.none.fl_str_mv Lara, Idemauro Antonio Rodrigues de
dc.contributor.author.fl_str_mv Araripe, Patricia Peres
dc.subject.por.fl_str_mv Distances
Distâncias
Generalized logit models
Gráfico meio-normal de probabilidade
Half-normal plot
Modelo dos logitos generalizados
Randomized quantile residual
Resíduo quantílico aleatorizado
topic Distances
Distâncias
Generalized logit models
Gráfico meio-normal de probabilidade
Half-normal plot
Modelo dos logitos generalizados
Randomized quantile residual
Resíduo quantílico aleatorizado
description Experiments and observational studies that result in polytomous data, nominal or ordinal, are frequently conducted in different areas of knowledge, especially in the agricultural or biological sciences. The generalized logit model is the alternative used for the analysis of this type of data and based on it, conclusions and decision-making are obtained. In statistical inference, it is very important to validate a model that has been fitted to the data using diagnostic methods based on appropriate residuals. However, residual analysis and diagnostics for models associated with polytomous response are still emerging in scientific research, constituting an object of research in the area of Statistics. As the polytomous categorical variable is multivariate, Pearsons ordinary residuals and deviance are vectors per individual with unknown distribution, which creates challenges in graphical visualization and interpretation. Randomized quantile residuals can be used to circumvent problems. However, it is observed that there is a lack of an investigation of its performance for the polytomous regression through simulation studies. As an alternative to reduce the dimension of the residuals and study outliers, this work proposes to use Euclidean and Mahalanobis distance measures, since there are no records of their use for the multinomial case. In this context, the methodological contributions of this work are: review of existing residuals for the class of models associated with polytomous data; study of the normality of randomized quantile residuals; proposition of using Euclidean and Mahalanobis distances to reduce the dimension of ordinary residuals, thus constituting a procedure for the diagnosis of generalized logit models, allowing the identification of the presence of outliers. Two applications illustrate the utility of the randomized quantile residuals and distance measurements. The performance of the proposed methods was done through simulation studies. In these studies, we evaluated the performance of randomized quantile residuals for individual nominal data as well as the use of Euclidean and Mahalanobis distances for grouped data. Graphic techniques such as the half-normal plot were used to assess the model and the Shapiro-Wilk test were used to verify normality of residuals. Under different scenarios, simulation studies have shown that the approaches are relevant to assess the goodness of fit of the generalized logits model to the data. Additionally, it is noted that such studies are just the beginning of a research area with many gaps to be filled.
publishDate 2022
dc.date.none.fl_str_mv 2022-07-08
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/11/11134/tde-14092022-153054/
url https://www.teses.usp.br/teses/disponiveis/11/11134/tde-14092022-153054/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809090720572637184