Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado

Melo Junior, Leopoldo Soares de

Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado

Detalhes bibliográficos
Autor(a) principal:	Melo Junior, Leopoldo Soares de
Data de Publicação:	2020
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Repositório Institucional da Universidade Federal do Ceará (UFC)
Texto Completo:	http://www.repositorio.ufc.br/handle/riufc/58918
Resumo:	Lenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.

Metadados do item

id	UFC-7_38cb6217f82ad8fe35f4259eb4244c75
oai_identifier_str	oai:repositorio.ufc.br:riufc/58918
network_acronym_str	UFC-7
network_name_str	Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceadoImproving dynamic selection prediction in imbalanced credit scoring problemsCredit scoringImbalanced learningDynamic selection classificationLenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.Os credores, como bancos e empresas de cartão de crédito, usam modelos de credit scoring para avaliar o risco potencial representado pelo empréstimo de dinheiro aos consumidores e, portanto, para mitigar perdas devido a inadimplência. Assim, a rentabilidade dos bancos depende muito dos modelos utilizados para decidir sobre os empréstimos dos clientes. Modelos de credit scoring de última geração usam aprendizado de máquina e métodos estatísticos. Um dos principais problemas desse campo é que os credores geralmente lidam com conjuntos de dados desequilibrados que geralmente contêm muitos empréstimos pagos, mas muito poucos empréstimos não pagos (chamados defaults). Recentemente, métodos de seleção dinâmica combinados com técnicas de pré-processamento têm sido avaliados para melhorar os modelos de classificação em dados desequilibrados apresentando vantagens sobre os métodos de aprendizado de máquina estáticos. Em uma técnica de seleção dinâmica, amostras conhecidas na vizinhança de uma amostra desconhecida são usadas para calcular a competência local dos classificadores base. Então, essas técnicas selecionam apenas classificadores localmente competentes na vizinhança da amostra desconhecida. A maioria das técnicas de seleção dinâmica usa o algoritmo k-NN para definir o conceito de região local. Nesta tese, modificamos técnicas de seleção dinâmica para melhorar o desempenho de previsão em conjuntos de dados de credit scoring desequilibrados. Primeiramente, avaliamos o desempenho de técnicas estáticas quando submetidas a vários níveis de desequilíbrio. A seguir, aplicamos técnicas de seleção dinâmica nos melhores ensembles do experimento anterior com uma nova definição da região local, a Reduced Minority k-NN (RMkNN). A intuição por trás do RMkNN é superar o comportamento tendencioso do kNN na definição das regiões locais em conjuntos de dados desequilibrados, principalmente selecionando amostras da classe majoritária. Depois, exploramos as melhorias modificando a métrica de desempenho usada para calcular a competência local dos classificadores básicos. A intuição é substituir a acurácia por uma medida mais adequada para conjuntos de dados desequilibrados. Esta métrica é FA2, a combinação da Fmeasure com o quadrado da acurácia. Descobrimos que essas modificações melhoram o desempenho de previsão em dados de credit scoring desequilibrados. Finalmente, combinamos as técnicas RMkNN e FA2 para avaliar a melhoria total da previsão no problema de credit scoring. Conduzimos uma avaliação abrangente da técnica proposta contra concorrentes de última geração em seis conjuntos de dados públicos do mundo real e um privado. Experimentos mostram que RMkNN e FA2 melhoram o desempenho de classificação dos dados avaliados em até 18% em relação a sete medidas de desempenho.Macêdo, José Antonio Fernandes deMelo Junior, Leopoldo Soares de2021-06-11T12:51:51Z2021-06-11T12:51:51Z2020info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfMELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020.http://www.repositorio.ufc.br/handle/riufc/58918engreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccess2021-06-11T12:51:51Zoai:repositorio.ufc.br:riufc/58918Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br \|\| repositorio@ufc.bropendoar:2024-09-11T18:15:42.511172Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.none.fl_str_mv	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado Improving dynamic selection prediction in imbalanced credit scoring problems
title	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
spellingShingle	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado Melo Junior, Leopoldo Soares de Credit scoring Imbalanced learning Dynamic selection classification
title_short	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_full	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_fullStr	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_full_unstemmed	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
title_sort	Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado
author	Melo Junior, Leopoldo Soares de
author_facet	Melo Junior, Leopoldo Soares de
author_role	author
dc.contributor.none.fl_str_mv	Macêdo, José Antonio Fernandes de
dc.contributor.author.fl_str_mv	Melo Junior, Leopoldo Soares de
dc.subject.por.fl_str_mv	Credit scoring Imbalanced learning Dynamic selection classification
topic	Credit scoring Imbalanced learning Dynamic selection classification
description	Lenders, such as banks and credit card companies use credit scoring models to evaluate the potential risk posed by lending money to consumers and, therefore, to mitigate losses due to bad credit. Thus, the profitability of the banks highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models use machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the base classifiers’ local competence. Then, these techniques select only locally competent classifiers according to each query sample. Most dynamic selection techniques use the k-NN algorithm to define the concept of the local region. In this thesis, we modify dynamic selection techniques to improve the prediction performance in imbalanced credit scoring datasets. First, we evaluate the performance of static techniques when submitted to several imbalanced levels. Next, we apply dynamic selection techniques in the best ensembles of the previous experiment with a new definition of the local region, the Reduced Minority k-Nearest Neighbors (RMkNN). The intuition behind RMkNN is to overcome the biased behavior of kNN in defining the local regions in imbalanced datasets, mainly selecting samples of the majority class. After, we explore improvements by modifying the performance measure used to compute the local competence of base classifiers. The intuition is to replace accuracy with a measure better suited to imbalanced datasets. This metric is FA2, the combination of F-measure with the square of accuracy. We find out that these modifications improve the prediction performance in imbalanced credit scoring datasets. Finally, we combine RMkNN and FA2 techniques to evaluate the total prediction improvement on the credit scoring problem. We conduct a comprehensive evaluation of the proposed technique against state-ofart competitors on six real-world public datasets and one private one. Experiments show that RMkNN and FA2 improve the classification performance of the evaluated datasets up to 18% regarding seven performance measures.
publishDate	2020
dc.date.none.fl_str_mv	2020 2021-06-11T12:51:51Z 2021-06-11T12:51:51Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	MELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020. http://www.repositorio.ufc.br/handle/riufc/58918
identifier_str_mv	MELO JUNIOR, Leopoldo Soares de. Improving dynamic selection prediction in imbalanced credit scoring problems. 2020. 105 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2020.
url	http://www.repositorio.ufc.br/handle/riufc/58918
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC
instname_str	Universidade Federal do Ceará (UFC)
instacron_str	UFC
institution	UFC
reponame_str	Repositório Institucional da Universidade Federal do Ceará (UFC)
collection	Repositório Institucional da Universidade Federal do Ceará (UFC)
repository.name.fl_str_mv	Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv	bu@ufc.br \|\| repositorio@ufc.br
_version_	1813028727156637696

Melhorando a predição de seleção dinâmica em problemas de Credit Scoring desbalanceado

Registros relacionados