Modelagem de eventos raros: um estudo comparativo

Detalhes bibliográficos
Autor(a) principal: Scacabarozi, Fernanda Nanci
Data de Publicação: 2012
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFSCAR
Texto Completo: https://repositorio.ufscar.br/handle/ufscar/4552
Resumo: In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts.
id SCAR_bafffd153e3910d4d962d70e69792464
oai_identifier_str oai:repositorio.ufscar.br:ufscar/4552
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Scacabarozi, Fernanda NanciDiniz, Carlos Alberto Ribeirohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781846J4&dataRevisao=nullhttp://lattes.cnpq.br/360926481789714733a81a4a-e225-4b47-978e-1f0f829870be2016-06-02T20:06:05Z2012-03-222016-06-02T20:06:05Z2012-01-16SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012.https://repositorio.ufscar.br/handle/ufscar/4552In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts.Em algumas situa_c~oes, nas mais diversas _areas do conhecimento, a vari_avel resposta de interesse possui distribui_c~ao dicot^omica extremamente desbalanceada. No mercado _nanceiro _e comum o interesse em determinar a probabilidade de que cada cliente venha a cometer uma a_c~ao fraudulenta, sendo que a propor_c~ao de clientes fraudadores _e extremamente pequena. Na _area da sa_ude existe o interesse em determinar a probabilidade de que uma determinada pessoa venha a apresentar alguma infec_c~ao epidemiol_ogica que atinge apenas uma diminuta parcela da popula_c~ao. No entanto, existem estudos que revelam que o modelo de regress~ao log__stica usual, amplamente utilizado na modelagem de dados bin_arios, n~ao produz bons resultados quando este _e constru__do utilizando bases de dados extremamente desbalanceadas. Na literatura, encontramos algumas propostas para o ajuste de modelos que levam em conta esta caracter__stica, tal como os estimadores KZ sugeridos por King e Zeng (2001) para o modelo de regress~ao log__stica aplicado em bases de dados com eventos raros. Neste trabalho apresentamos esta metodologia e um estudo de simula_c~ao para veri_car a qualidade destes estimadores. Outras propostas encontradas na literatura s~ao o modelo logito limitado sugerido por Cramer (2004) que limita superiormente a probabilidade de sucesso e o modelo logito generalizado sugerido por Stukel (1988) que apresenta dois par^ametros de forma e funciona melhor que o modelo logito usual nas situa_c~oes em que a curva de probabilidade n~ao _e sim_etrica em torno do ponto 1 2 . Neste trabalho apresentamos algumas simula_c~oes para veri_car as vantagens do usos destes modelos.Financiadora de Estudos e Projetosapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Estatística - PPGEsUFSCarBRProbabilidadesModelo logitoModelo logito limitado Modelo logito generalizadoModelo logito com resposta de origem Estimadores KZCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAModelagem de eventos raros: um estudo comparativoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-184611362-11c0-4efd-b118-a7df9999df87info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL4139.pdfapplication/pdf2492387https://repositorio.ufscar.br/bitstream/ufscar/4552/1/4139.pdfd478498a0d367106a7ad8dfe2a681cf3MD51THUMBNAIL4139.pdf.jpg4139.pdf.jpgIM Thumbnailimage/jpeg4837https://repositorio.ufscar.br/bitstream/ufscar/4552/2/4139.pdf.jpg850d027c79bea1b01890be383ffd4509MD52ufscar/45522023-09-18 18:31:02.44oai:repositorio.ufscar.br:ufscar/4552Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:02Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv Modelagem de eventos raros: um estudo comparativo
title Modelagem de eventos raros: um estudo comparativo
spellingShingle Modelagem de eventos raros: um estudo comparativo
Scacabarozi, Fernanda Nanci
Probabilidades
Modelo logito
Modelo logito limitado Modelo logito generalizado
Modelo logito com resposta de origem Estimadores KZ
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
title_short Modelagem de eventos raros: um estudo comparativo
title_full Modelagem de eventos raros: um estudo comparativo
title_fullStr Modelagem de eventos raros: um estudo comparativo
title_full_unstemmed Modelagem de eventos raros: um estudo comparativo
title_sort Modelagem de eventos raros: um estudo comparativo
author Scacabarozi, Fernanda Nanci
author_facet Scacabarozi, Fernanda Nanci
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/3609264817897147
dc.contributor.author.fl_str_mv Scacabarozi, Fernanda Nanci
dc.contributor.advisor1.fl_str_mv Diniz, Carlos Alberto Ribeiro
dc.contributor.advisor1Lattes.fl_str_mv http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781846J4&dataRevisao=null
dc.contributor.authorID.fl_str_mv 33a81a4a-e225-4b47-978e-1f0f829870be
contributor_str_mv Diniz, Carlos Alberto Ribeiro
dc.subject.por.fl_str_mv Probabilidades
Modelo logito
Modelo logito limitado Modelo logito generalizado
Modelo logito com resposta de origem Estimadores KZ
topic Probabilidades
Modelo logito
Modelo logito limitado Modelo logito generalizado
Modelo logito com resposta de origem Estimadores KZ
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
description In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts.
publishDate 2012
dc.date.available.fl_str_mv 2012-03-22
2016-06-02T20:06:05Z
dc.date.issued.fl_str_mv 2012-01-16
dc.date.accessioned.fl_str_mv 2016-06-02T20:06:05Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/ufscar/4552
identifier_str_mv SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012.
url https://repositorio.ufscar.br/handle/ufscar/4552
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv -1
-1
dc.relation.authority.fl_str_mv 84611362-11c0-4efd-b118-a7df9999df87
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Estatística - PPGEs
dc.publisher.initials.fl_str_mv UFSCar
dc.publisher.country.fl_str_mv BR
publisher.none.fl_str_mv Universidade Federal de São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstream/ufscar/4552/1/4139.pdf
https://repositorio.ufscar.br/bitstream/ufscar/4552/2/4139.pdf.jpg
bitstream.checksum.fl_str_mv d478498a0d367106a7ad8dfe2a681cf3
850d027c79bea1b01890be383ffd4509
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_ 1802136276757905408