Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection

Detalhes bibliográficos
Autor(a) principal: Moreno, María Fernanda Osorio
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/33863
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_a1aa338ceba38709210833f21c7aee39
oai_identifier_str oai:run.unl.pt:10362/33863
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detectionImbalanced datasetsFraudoversamplingInsuranceDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsAlthough the current trend of data production is focused on generating tons of it every second, there are situations where the target category is represented extremely unequally, giving rise to imbalanced datasets, analyzing them correctly can lead to relevant decisions that produces appropriate business strategies. Fraud modeling is one example of this situation: it is expected less fraudulent transactions than reliable ones, predict them could be crucial for improving decisions and processes in a company. However, class imbalance produces a negative effect on traditional techniques in dealing with this problem, a lot of techniques have been proposed and oversampling is one of them. This work analyses the behavior of different oversampling techniques such as Random oversampling, SOMO and SMOTE, through different classifiers and evaluation metrics. The exercise is done with real data from an insurance company in Colombia predicting fraudulent claims for its compulsory auto product. Conclusions of this research demonstrate the advantages of using oversampling for imbalance circumstances but also the importance of comparing different evaluation metrics and classifiers to obtain accurate appropriate conclusions and comparable results.Bação, Fernando José Ferreira LucasRUNMoreno, María Fernanda Osorio2018-04-05T13:24:16Z2018-03-262018-03-26T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/33863TID:201894289enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:18:35Zoai:run.unl.pt:10362/33863Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:30:05.469104Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
title Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
spellingShingle Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
Moreno, María Fernanda Osorio
Imbalanced datasets
Fraud
oversampling
Insurance
title_short Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
title_full Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
title_fullStr Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
title_full_unstemmed Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
title_sort Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection
author Moreno, María Fernanda Osorio
author_facet Moreno, María Fernanda Osorio
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Moreno, María Fernanda Osorio
dc.subject.por.fl_str_mv Imbalanced datasets
Fraud
oversampling
Insurance
topic Imbalanced datasets
Fraud
oversampling
Insurance
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2018
dc.date.none.fl_str_mv 2018-04-05T13:24:16Z
2018-03-26
2018-03-26T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/33863
TID:201894289
url http://hdl.handle.net/10362/33863
identifier_str_mv TID:201894289
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137925223415808