A hybrid model for fraud detection on purchase orders based on unsupervised learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) |
Texto Completo: | http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071 |
Resumo: | Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area. |
id |
USIN_779c5f06a3af62d8ea0464e2a0cc9d49 |
---|---|
oai_identifier_str |
oai:www.repositorio.jesuita.org.br:UNISINOS/9071 |
network_acronym_str |
USIN |
network_name_str |
Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) |
repository_id_str |
|
spelling |
2020-02-18T17:30:26Z2020-02-18T17:30:26Z2019-09-17Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2020-02-18T17:30:26Z No. of bitstreams: 1 William Ferreira Moreno Oliverio_.pdf: 1869780 bytes, checksum: 5387d0a655b8e907ca39ed4761b77fab (MD5)Made available in DSpace on 2020-02-18T17:30:26Z (GMT). No. of bitstreams: 1 William Ferreira Moreno Oliverio_.pdf: 1869780 bytes, checksum: 5387d0a655b8e907ca39ed4761b77fab (MD5) Previous issue date: 2019-09-17Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area.A fraude na área de compras é uma questão que afeta empresas de todo o mundo. Esse problema é tratado com auditorias. No entanto, devido ao grande volume de dados disponíveis, é impossível verificar todas as transações de uma empresa. Portanto, apenas uma pequena amostra dos dados é verificada. Devido ao pequeno número de fraudes em comparação com as transações padrão, frequentemente essas transações fraudulentas não são incluídas na amostra e, portanto, não são verificadas durante a auditoria. Este trabalho apresenta uma nova abordagem utilizando as técnicas de detecção de assinatura associadas ao clustering para aumentar a probabilidade de inclusão de documentos relacionados à fraude na amostra. Devido à inexistência de um banco de dados público para detecção de fraudes relacionadas à área de compras das empresas, este trabalho utiliza dados de aquisições reais para comparar a probabilidade de selecionar um documento fraudulento em uma amostra de dados. Nosso trabalho compara amostragem aleatória versus a amostragem obtida a partir do modelo proposto. Também exploramos qual seria o melhor algoritmo de clustering para esse problema específico. A metodologia proposta foi capaz de classificar os documentos de compras em diferentes clusters através da utilização do algortimo HDSCAN, no qual um deles contendo os documentos com o maior volume de sintomas associados a transações fraudulentas, de uma maneira completamente automática, algo que não foi encontrado nos papers relacionados ao tópico de fraudes na área de compras corporativas.NenhumaOliverio, William Ferreira Morenohttp://lattes.cnpq.br/7380008989239864http://lattes.cnpq.br/3914159735707328Rigo, Sandro JoséUniversidade do Vale do Rio dos SinosPrograma de Pós-Graduação em Computação AplicadaUnisinosBrasilEscola PolitécnicaA hybrid model for fraud detection on purchase orders based on unsupervised learningACCNPQ::Ciências Exatas e da Terra::Ciência da ComputaçãoFraud detectionAgrupamentoDetecção de assinaturasProcurementNon-supervised machine learningClusteringSignature detectionDetecção de fraudesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://www.repositorio.jesuita.org.br/handle/UNISINOS/9071info:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:UNISINOSORIGINALWilliam Ferreira Moreno Oliverio_.pdfWilliam Ferreira Moreno Oliverio_.pdfapplication/pdf1869780http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/1/William+Ferreira+Moreno+Oliverio_.pdf5387d0a655b8e907ca39ed4761b77fabMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82175http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/2/license.txt320e21f23402402ac4988605e1edd177MD52UNISINOS/90712020-02-18 14:32:24.837oai:www.repositorio.jesuita.org.br:UNISINOS/9071Ck5PVEE6IENPTE9RVUUgQVFVSSBBIFNVQSBQUsOTUFJJQSBMSUNFTsOHQQoKRXN0YSBsaWNlbsOnYSBkZSBleGVtcGxvIMOpIGZvcm5lY2lkYSBhcGVuYXMgcGFyYSBmaW5zIGluZm9ybWF0aXZvcy4KCkxpY2Vuw6dhIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSwgdm9jw6ogKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIMOgIApVbml2ZXJzaWRhZGUgZG8gVmFsZSBkbyBSaW8gZG9zIFNpbm9zIChVTklTSU5PUykgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdSAKZGlzdHJpYnVpciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gKGluY2x1aW5kbyBvIHJlc3VtbykgcG9yIHRvZG8gbyBtdW5kbyBubyBmb3JtYXRvIGltcHJlc3NvIGUgZWxldHLDtG5pY28gZSAKZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAKcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdSAKZGlzc2VydGHDp8OjbyBwYXJhIGZpbnMgZGUgc2VndXJhbsOnYSwgYmFjay11cCBlIHByZXNlcnZhw6fDo28uCgpWb2PDqiBkZWNsYXJhIHF1ZSBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gw6kgb3JpZ2luYWwgZSBxdWUgdm9jw6ogdGVtIG8gcG9kZXIgZGUgY29uY2VkZXIgb3MgZGlyZWl0b3MgY29udGlkb3MgCm5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IApjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6ogCmRlY2xhcmEgcXVlIG9idGV2ZSBhIHBlcm1pc3PDo28gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciDDoCBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgCm9zIGRpcmVpdG9zIGFwcmVzZW50YWRvcyBuZXN0YSBsaWNlbsOnYSwgZSBxdWUgZXNzZSBtYXRlcmlhbCBkZSBwcm9wcmllZGFkZSBkZSB0ZXJjZWlyb3MgZXN0w6EgY2xhcmFtZW50ZSAKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgCkFQT0lPIERFIFVNQSBBR8OKTkNJQSBERSBGT01FTlRPIE9VIE9VVFJPIE9SR0FOSVNNTyBRVUUgTsODTyBTRUpBIEEgU0lHTEEgREUgClVOSVZFUlNJREFERSwgVk9Dw4ogREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklTw4NPIENPTU8gClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyAKY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KBiblioteca Digital de Teses e Dissertaçõeshttp://www.repositorio.jesuita.org.br/oai/requestopendoar:2020-02-18T17:32:24Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) - Universidade do Vale do Rio dos Sinos (UNISINOS)false |
dc.title.pt_BR.fl_str_mv |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
title |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
spellingShingle |
A hybrid model for fraud detection on purchase orders based on unsupervised learning Oliverio, William Ferreira Moreno ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação Fraud detection Agrupamento Detecção de assinaturas Procurement Non-supervised machine learning Clustering Signature detection Detecção de fraudes |
title_short |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
title_full |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
title_fullStr |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
title_full_unstemmed |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
title_sort |
A hybrid model for fraud detection on purchase orders based on unsupervised learning |
author |
Oliverio, William Ferreira Moreno |
author_facet |
Oliverio, William Ferreira Moreno |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/7380008989239864 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/3914159735707328 |
dc.contributor.author.fl_str_mv |
Oliverio, William Ferreira Moreno |
dc.contributor.advisor1.fl_str_mv |
Rigo, Sandro José |
contributor_str_mv |
Rigo, Sandro José |
dc.subject.cnpq.fl_str_mv |
ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação |
topic |
ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação Fraud detection Agrupamento Detecção de assinaturas Procurement Non-supervised machine learning Clustering Signature detection Detecção de fraudes |
dc.subject.por.fl_str_mv |
Fraud detection Agrupamento Detecção de assinaturas |
dc.subject.eng.fl_str_mv |
Procurement Non-supervised machine learning Clustering Signature detection Detecção de fraudes |
description |
Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area. |
publishDate |
2019 |
dc.date.issued.fl_str_mv |
2019-09-17 |
dc.date.accessioned.fl_str_mv |
2020-02-18T17:30:26Z |
dc.date.available.fl_str_mv |
2020-02-18T17:30:26Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071 |
url |
http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade do Vale do Rio dos Sinos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Computação Aplicada |
dc.publisher.initials.fl_str_mv |
Unisinos |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Escola Politécnica |
publisher.none.fl_str_mv |
Universidade do Vale do Rio dos Sinos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) instname:Universidade do Vale do Rio dos Sinos (UNISINOS) instacron:UNISINOS |
instname_str |
Universidade do Vale do Rio dos Sinos (UNISINOS) |
instacron_str |
UNISINOS |
institution |
UNISINOS |
reponame_str |
Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) |
collection |
Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) |
bitstream.url.fl_str_mv |
http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/1/William+Ferreira+Moreno+Oliverio_.pdf http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/2/license.txt |
bitstream.checksum.fl_str_mv |
5387d0a655b8e907ca39ed4761b77fab 320e21f23402402ac4988605e1edd177 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) - Universidade do Vale do Rio dos Sinos (UNISINOS) |
repository.mail.fl_str_mv |
|
_version_ |
1801844937395798016 |