Machine Learning applied to credit risk assessment: Prediction of loan defaults

Detalhes bibliográficos
Autor(a) principal: Simão, Sofia Beatriz Santos
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/149818
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_e0ffc47e83258a0caca6e055e6edc721
oai_identifier_str oai:run.unl.pt:10362/149818
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Machine Learning applied to credit risk assessment: Prediction of loan defaultsCredit RiskMachine LearningLogistic RegressionEnsemble MethodsLoan DefaultsDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceDue to the recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming a very important topic in the field of financial risk management. Financial institutions need to take great care when dealing with consumer loans in order to avoid losses and costs of opportunity. For this matter, credit scoring systems have been used to make informed decisions on whether or not to grant credit to clients who apply to them. Until now several credit scoring models have been proposed, from statistical models, to more complex artificial intelligence techniques. However, most of previous work is focused on employing single classifiers. Ensemble learning is a powerful machine learning paradigm which has proven to be of great value in solving a variety of problems. This study compares the performance of the industry standard, logistic regression, to four ensemble methods, i.e. AdaBoost, Gradient Boosting, Random Forest and Stacking in identifying potential loan defaults. All the models were built with a real world dataset with over one million customers from Lending Club, a financial institution based in the United States. The performance of the models was compared by using the Hold-out method as the evaluation design and accuracy, AUC, type I error and type II error as evaluation metrics. Experimental results reveal that the ensemble classifiers were able to outperform logistic regression on three key indicators, i.e. accuracy, type I error and type II error. AdaBoost performed better than the remaining classifiers considering a trade off between all the metrics evaluated. The main contribution of this thesis is an experimental addition to the literature on the preferred models for predicting potential loan defaulters.Castelli, MauroRUNSimão, Sofia Beatriz Santos2023-02-28T18:49:41Z2023-01-262023-01-26T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/149818TID:203239067enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:31:42Zoai:run.unl.pt:10362/149818Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:52.909378Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Machine Learning applied to credit risk assessment: Prediction of loan defaults
title Machine Learning applied to credit risk assessment: Prediction of loan defaults
spellingShingle Machine Learning applied to credit risk assessment: Prediction of loan defaults
Simão, Sofia Beatriz Santos
Credit Risk
Machine Learning
Logistic Regression
Ensemble Methods
Loan Defaults
title_short Machine Learning applied to credit risk assessment: Prediction of loan defaults
title_full Machine Learning applied to credit risk assessment: Prediction of loan defaults
title_fullStr Machine Learning applied to credit risk assessment: Prediction of loan defaults
title_full_unstemmed Machine Learning applied to credit risk assessment: Prediction of loan defaults
title_sort Machine Learning applied to credit risk assessment: Prediction of loan defaults
author Simão, Sofia Beatriz Santos
author_facet Simão, Sofia Beatriz Santos
author_role author
dc.contributor.none.fl_str_mv Castelli, Mauro
RUN
dc.contributor.author.fl_str_mv Simão, Sofia Beatriz Santos
dc.subject.por.fl_str_mv Credit Risk
Machine Learning
Logistic Regression
Ensemble Methods
Loan Defaults
topic Credit Risk
Machine Learning
Logistic Regression
Ensemble Methods
Loan Defaults
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2023
dc.date.none.fl_str_mv 2023-02-28T18:49:41Z
2023-01-26
2023-01-26T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/149818
TID:203239067
url http://hdl.handle.net/10362/149818
identifier_str_mv TID:203239067
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138128958586880