Machine learning: Challenges and opportunities on credit risk

Detalhes bibliográficos
Autor(a) principal: Costa, Patrícia Alexandra Guerreiro
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/27558
Resumo: The constant challenge in anticipating the risk of default by borrowers has led financial institutions to develop techniques and models to improve their credit risk monitoring, and to predict how likely it is for certain customers to default on a loan, as well as how likely it is for others to meet their financial obligations. Thus, it is interesting to investigate how financial institutions can anticipate this occurrence using Machine Learning algorithms. This dissertation aims to demonstrate the power of Machine Learning algorithms in credit risk analysis, focusing on building the models, training them, and testing the data, and presenting the opportunities and challenges of Machine Learning that are still open to developing future studies. For this purpose, we present two Machine Learning classification algorithms: Decision Trees and Logistic Regression. In addition, numerical results obtained from various comparisons of these algorithms, which were programmed and ran in Python using the Jupyter Notebook application, are also presented. The initial sample data, consisting of 850 observations, contained credit details about borrowers in the United States of America, and is freely available data. To check the model execution and performance, between Logistic Regression and Decision Trees, we used measures such as AUC, precision and F1-score.
id RCAP_1e2ec9e3a2da32946ab1796ff9d8427a
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/27558
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Machine learning: Challenges and opportunities on credit riskRisco de crédito -- Credit riskData mining --Machine learningLinear and logistic regressionÁrvore de decisão -- Decision treeRandom forestRegressão linear e logísticaFloresta aleatóriaThe constant challenge in anticipating the risk of default by borrowers has led financial institutions to develop techniques and models to improve their credit risk monitoring, and to predict how likely it is for certain customers to default on a loan, as well as how likely it is for others to meet their financial obligations. Thus, it is interesting to investigate how financial institutions can anticipate this occurrence using Machine Learning algorithms. This dissertation aims to demonstrate the power of Machine Learning algorithms in credit risk analysis, focusing on building the models, training them, and testing the data, and presenting the opportunities and challenges of Machine Learning that are still open to developing future studies. For this purpose, we present two Machine Learning classification algorithms: Decision Trees and Logistic Regression. In addition, numerical results obtained from various comparisons of these algorithms, which were programmed and ran in Python using the Jupyter Notebook application, are also presented. The initial sample data, consisting of 850 observations, contained credit details about borrowers in the United States of America, and is freely available data. To check the model execution and performance, between Logistic Regression and Decision Trees, we used measures such as AUC, precision and F1-score.O constante desafio na antecipação do risco de incumprimento por parte dos tomadores de crédito, levou a que as instituições financeiras desenvolvessem técnicas e modelos de forma a melhorar a sua monitorização do risco de crédito, e antever o quão provável será para determinados clientes entrar em incumprimento, assim como o quão provável será para outros de cumprirem com as suas obrigações financeiras. Portanto, interessa averiguar como as instituições financeiras podem antecipar esta ocorrência beneficiando de algoritmos de Machine Learning. A presente dissertação pretende demonstrar o poder dos algoritmos de Machine Learning na análise de risco de crédito, com foco no processo de construção dos modelos, treinando-os e testando os dados, e apresentar as oportunidades e os desafios de Machine Learning que ainda estão em aberto para desenvolver futuros estudos. Para esse propósito, apresentamos dois algoritmos de classificação de Machine Learning: as Árvores de Decisão e a Regressão Logística. Adicionalmente, também se apresenta os resultados numéricos obtidos entre várias comparações desses algoritmos que foram programados e corridos em Python, utilizando a aplicação Jupyter Notebook. Os dados da amostra inicial, constituída por 850 observações, contêm detalhes de crédito sobre os tomadores de empréstimos nos Estados Unidos da América, sendo os dados de livre acesso e uitilização. Para verificar a execução e a performance do modelo, entre Regressão Logística e Árvores de Decisão, usamos medidas como o AUC, precisão e F1-score.2023-01-27T15:50:16Z2022-12-06T00:00:00Z2022-12-062022-10info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/27558TID:203127323engCosta, Patrícia Alexandra Guerreiroinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:40:02Zoai:repositorio.iscte-iul.pt:10071/27558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:18:29.355833Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Machine learning: Challenges and opportunities on credit risk
title Machine learning: Challenges and opportunities on credit risk
spellingShingle Machine learning: Challenges and opportunities on credit risk
Costa, Patrícia Alexandra Guerreiro
Risco de crédito -- Credit risk
Data mining --
Machine learning
Linear and logistic regression
Árvore de decisão -- Decision tree
Random forest
Regressão linear e logística
Floresta aleatória
title_short Machine learning: Challenges and opportunities on credit risk
title_full Machine learning: Challenges and opportunities on credit risk
title_fullStr Machine learning: Challenges and opportunities on credit risk
title_full_unstemmed Machine learning: Challenges and opportunities on credit risk
title_sort Machine learning: Challenges and opportunities on credit risk
author Costa, Patrícia Alexandra Guerreiro
author_facet Costa, Patrícia Alexandra Guerreiro
author_role author
dc.contributor.author.fl_str_mv Costa, Patrícia Alexandra Guerreiro
dc.subject.por.fl_str_mv Risco de crédito -- Credit risk
Data mining --
Machine learning
Linear and logistic regression
Árvore de decisão -- Decision tree
Random forest
Regressão linear e logística
Floresta aleatória
topic Risco de crédito -- Credit risk
Data mining --
Machine learning
Linear and logistic regression
Árvore de decisão -- Decision tree
Random forest
Regressão linear e logística
Floresta aleatória
description The constant challenge in anticipating the risk of default by borrowers has led financial institutions to develop techniques and models to improve their credit risk monitoring, and to predict how likely it is for certain customers to default on a loan, as well as how likely it is for others to meet their financial obligations. Thus, it is interesting to investigate how financial institutions can anticipate this occurrence using Machine Learning algorithms. This dissertation aims to demonstrate the power of Machine Learning algorithms in credit risk analysis, focusing on building the models, training them, and testing the data, and presenting the opportunities and challenges of Machine Learning that are still open to developing future studies. For this purpose, we present two Machine Learning classification algorithms: Decision Trees and Logistic Regression. In addition, numerical results obtained from various comparisons of these algorithms, which were programmed and ran in Python using the Jupyter Notebook application, are also presented. The initial sample data, consisting of 850 observations, contained credit details about borrowers in the United States of America, and is freely available data. To check the model execution and performance, between Logistic Regression and Decision Trees, we used measures such as AUC, precision and F1-score.
publishDate 2022
dc.date.none.fl_str_mv 2022-12-06T00:00:00Z
2022-12-06
2022-10
2023-01-27T15:50:16Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/27558
TID:203127323
url http://hdl.handle.net/10071/27558
identifier_str_mv TID:203127323
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134743959175168