Modelagem de eventos raros: um estudo comparativo
Autor(a) principal: | |
---|---|
Data de Publicação: | 2012 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/4552 |
Resumo: | In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts. |
id |
SCAR_bafffd153e3910d4d962d70e69792464 |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/4552 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Scacabarozi, Fernanda NanciDiniz, Carlos Alberto Ribeirohttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781846J4&dataRevisao=nullhttp://lattes.cnpq.br/360926481789714733a81a4a-e225-4b47-978e-1f0f829870be2016-06-02T20:06:05Z2012-03-222016-06-02T20:06:05Z2012-01-16SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012.https://repositorio.ufscar.br/handle/ufscar/4552In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts.Em algumas situa_c~oes, nas mais diversas _areas do conhecimento, a vari_avel resposta de interesse possui distribui_c~ao dicot^omica extremamente desbalanceada. No mercado _nanceiro _e comum o interesse em determinar a probabilidade de que cada cliente venha a cometer uma a_c~ao fraudulenta, sendo que a propor_c~ao de clientes fraudadores _e extremamente pequena. Na _area da sa_ude existe o interesse em determinar a probabilidade de que uma determinada pessoa venha a apresentar alguma infec_c~ao epidemiol_ogica que atinge apenas uma diminuta parcela da popula_c~ao. No entanto, existem estudos que revelam que o modelo de regress~ao log__stica usual, amplamente utilizado na modelagem de dados bin_arios, n~ao produz bons resultados quando este _e constru__do utilizando bases de dados extremamente desbalanceadas. Na literatura, encontramos algumas propostas para o ajuste de modelos que levam em conta esta caracter__stica, tal como os estimadores KZ sugeridos por King e Zeng (2001) para o modelo de regress~ao log__stica aplicado em bases de dados com eventos raros. Neste trabalho apresentamos esta metodologia e um estudo de simula_c~ao para veri_car a qualidade destes estimadores. Outras propostas encontradas na literatura s~ao o modelo logito limitado sugerido por Cramer (2004) que limita superiormente a probabilidade de sucesso e o modelo logito generalizado sugerido por Stukel (1988) que apresenta dois par^ametros de forma e funciona melhor que o modelo logito usual nas situa_c~oes em que a curva de probabilidade n~ao _e sim_etrica em torno do ponto 1 2 . Neste trabalho apresentamos algumas simula_c~oes para veri_car as vantagens do usos destes modelos.Financiadora de Estudos e Projetosapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Estatística - PPGEsUFSCarBRProbabilidadesModelo logitoModelo logito limitado Modelo logito generalizadoModelo logito com resposta de origem Estimadores KZCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAModelagem de eventos raros: um estudo comparativoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-184611362-11c0-4efd-b118-a7df9999df87info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL4139.pdfapplication/pdf2492387https://repositorio.ufscar.br/bitstream/ufscar/4552/1/4139.pdfd478498a0d367106a7ad8dfe2a681cf3MD51THUMBNAIL4139.pdf.jpg4139.pdf.jpgIM Thumbnailimage/jpeg4837https://repositorio.ufscar.br/bitstream/ufscar/4552/2/4139.pdf.jpg850d027c79bea1b01890be383ffd4509MD52ufscar/45522023-09-18 18:31:02.44oai:repositorio.ufscar.br:ufscar/4552Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:02Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Modelagem de eventos raros: um estudo comparativo |
title |
Modelagem de eventos raros: um estudo comparativo |
spellingShingle |
Modelagem de eventos raros: um estudo comparativo Scacabarozi, Fernanda Nanci Probabilidades Modelo logito Modelo logito limitado Modelo logito generalizado Modelo logito com resposta de origem Estimadores KZ CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
title_short |
Modelagem de eventos raros: um estudo comparativo |
title_full |
Modelagem de eventos raros: um estudo comparativo |
title_fullStr |
Modelagem de eventos raros: um estudo comparativo |
title_full_unstemmed |
Modelagem de eventos raros: um estudo comparativo |
title_sort |
Modelagem de eventos raros: um estudo comparativo |
author |
Scacabarozi, Fernanda Nanci |
author_facet |
Scacabarozi, Fernanda Nanci |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/3609264817897147 |
dc.contributor.author.fl_str_mv |
Scacabarozi, Fernanda Nanci |
dc.contributor.advisor1.fl_str_mv |
Diniz, Carlos Alberto Ribeiro |
dc.contributor.advisor1Lattes.fl_str_mv |
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781846J4&dataRevisao=null |
dc.contributor.authorID.fl_str_mv |
33a81a4a-e225-4b47-978e-1f0f829870be |
contributor_str_mv |
Diniz, Carlos Alberto Ribeiro |
dc.subject.por.fl_str_mv |
Probabilidades Modelo logito Modelo logito limitado Modelo logito generalizado Modelo logito com resposta de origem Estimadores KZ |
topic |
Probabilidades Modelo logito Modelo logito limitado Modelo logito generalizado Modelo logito com resposta de origem Estimadores KZ CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
description |
In some situations, in various areas of knowledge, the response variable of interest has dichotomous distribution extremely unbalanced. In the _nancial market is the common interest in determining the probability that each customer will commit a fraudulent action, and the proportion of customers fraudsters is extremely small. In health there is interest in determining the probability that a particular person will present some epidemiological infection that a_ects only a small fraction of the population. However, there are studies that show that the usual logistic regression model, widely used in the modeling of binary data, does not produce good results when it is built using databases extremely unbalanced. In the literature, we _nd some proposals for adjusting models them that take into account this characteristic, such as KZ estimators suggested by King and Zeng (2001) for the logistic regression model applied to databases with events rare. We present this methodology and a simulation study to verify the quality of these estimators. Other proposals in the literature are limited logit model suggested by Cramer (2004) that upper limit to the probability of success and the generalized logit model suggested by Stukel (1988) which has two shape parameters and works better than the usual logit model in situations that the probability curve is not symmetrical around the point 1 2 . In this paper we present some simulations to verify the advantages of the use of these models. Palavras-chave: model logit model limited, generalized logit model, logit model with response of origin, KZ estimators, measures forecasts. |
publishDate |
2012 |
dc.date.available.fl_str_mv |
2012-03-22 2016-06-02T20:06:05Z |
dc.date.issued.fl_str_mv |
2012-01-16 |
dc.date.accessioned.fl_str_mv |
2016-06-02T20:06:05Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/4552 |
identifier_str_mv |
SCACABAROZI, Fernanda Nanci. Modelagem de eventos raros: um estudo comparativo. 2012. 133 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2012. |
url |
https://repositorio.ufscar.br/handle/ufscar/4552 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
-1 -1 |
dc.relation.authority.fl_str_mv |
84611362-11c0-4efd-b118-a7df9999df87 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Estatística - PPGEs |
dc.publisher.initials.fl_str_mv |
UFSCar |
dc.publisher.country.fl_str_mv |
BR |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/4552/1/4139.pdf https://repositorio.ufscar.br/bitstream/ufscar/4552/2/4139.pdf.jpg |
bitstream.checksum.fl_str_mv |
d478498a0d367106a7ad8dfe2a681cf3 850d027c79bea1b01890be383ffd4509 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1802136276757905408 |