Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado

Detalhes bibliográficos
Autor(a) principal: Melo Junior, Leopoldo Soares de
Data de Publicação: 2020
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da Universidade Federal do Ceará (UFC)
Texto Completo: http://www.repositorio.ufc.br/handle/riufc/58918
Resumo: Lenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.
id UFC-7_38cb6217f82ad8fe35f4259eb4244c75
oai_identifier_str oai:repositorio.ufc.br:riufc/58918
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceadoImproving dynamic selection prediction in imbalanced credit scoring problemsCredit scoringImbalanced learningDynamic selection classificationLenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.Os credores, como bancos e empresas de cartão de crédito, usam modelos de credit scoring para avaliar o risco potencial representado pelo empréstimo de dinheiro aos consumidores e, portanto, para mitigar perdas devido a inadimplência. Assim, a rentabilidade dos bancos depende muito dos modelos utilizados para decidir sobre os empréstimos dos clientes. Modelos de credit scoring de última geração usam aprendizado de máquina e métodos estatísticos. Um dos principais problemas desse campo é que os credores geralmente lidam com conjuntos de dados desequilibrados que geralmente contêm muitos empréstimos pagos, mas muito poucos empréstimos não pagos (chamados defaults). Recentemente, métodos de seleção dinâmica combinados com técnicas de pré-processamento têm sido avaliados para melhorar os modelos de classificação em dados desequilibrados apresentando vantagens sobre os métodos de aprendizado de máquina estáticos. Em uma técnica de seleção dinâmica, amostras conhecidas na vizinhança de uma amostra desconhecida são usadas para calcular a competência local dos classificadores base. Então, essas técnicas selecionam apenas classificadores localmente competentes na vizinhança da amostra desconhecida. A maioria das técnicas de seleção dinâmica usa o algoritmo k-NN para definir o conceito de região local. Nesta tese, modificamos técnicas de seleção dinâmica para melhorar o desempenho de previsão em conjuntos de dados de credit scoring desequilibrados. Primeiramente, avaliamos o desempenho de técnicas estáticas quando submetidas a vários níveis de desequilíbrio. A seguir, aplicamos técnicas de seleção dinâmica nos melhores ensembles do experimento anterior com uma nova definição da região local, a Reduced Minority k-NN (RMkNN). A intuição por trás do RMkNN é superar o comportamento tendencioso do kNN na definição das regiões locais em conjuntos de dados desequilibrados, principalmente selecionando amostras da classe majoritária. Depois, exploramos as melhorias modificando a métrica de desempenho usada para calcular a competência local dos classificadores básicos. A intuição é substituir a acurácia por uma medida mais adequada para conjuntos de dados desequilibrados. Esta métrica é FA2, a combinação da Fmeasure com o quadrado da acurácia. Descobrimos que essas modificações melhoram o desempenho de previsão em dados de credit scoring desequilibrados. Finalmente, combinamos as técnicas RMkNN e FA2 para avaliar a melhoria total da previsão no problema de credit scoring. Conduzimos uma avaliação abrangente da técnica proposta contra concorrentes de última geração em seis conjuntos de dados públicos do mundo real e um privado. Experimentos mostram que RMkNN e FA2 melhoram o desempenho de classificação dos dados avaliados em até 18% em relação a sete medidas de desempenho.Macêdo, José Antonio Fernandes deMelo Junior, Leopoldo Soares de2021-06-11T12:51:51Z2021-06-11T12:51:51Z2020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfMELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020.http://www.repositorio.ufc.br/handle/riufc/58918engreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccess2021-06-11T12:51:51Zoai:repositorio.ufc.br:riufc/58918Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2024-09-11T18:15:42.511172Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.none.fl_str_mv Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
Improving dynamic selection prediction in imbalanced credit scoring problems
title Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
spellingShingle Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
Melo Junior, Leopoldo Soares de
Credit scoring
Imbalanced learning
Dynamic selection classification
title_short Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_full Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_fullStr Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_full_unstemmed Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_sort Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
author Melo Junior, Leopoldo Soares de
author_facet Melo Junior, Leopoldo Soares de
author_role author
dc.contributor.none.fl_str_mv Macêdo, José Antonio Fernandes de
dc.contributor.author.fl_str_mv Melo Junior, Leopoldo Soares de
dc.subject.por.fl_str_mv Credit scoring
Imbalanced learning
Dynamic selection classification
topic Credit scoring
Imbalanced learning
Dynamic selection classification
description Lenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.
publishDate 2020
dc.date.none.fl_str_mv 2020
2021-06-11T12:51:51Z
2021-06-11T12:51:51Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv MELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020.
http://www.repositorio.ufc.br/handle/riufc/58918
identifier_str_mv MELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020.
url http://www.repositorio.ufc.br/handle/riufc/58918
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1813028727156637696