A hybrid model for fraud detection on purchase orders based on unsupervised learning

Detalhes bibliográficos
Autor(a) principal: Oliverio, William Ferreira Moreno
Data de Publicação: 2019
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)
Texto Completo: http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071
Resumo: Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area.
id USIN_779c5f06a3af62d8ea0464e2a0cc9d49
oai_identifier_str oai:www.repositorio.jesuita.org.br:UNISINOS/9071
network_acronym_str USIN
network_name_str Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)
repository_id_str
spelling 2020-02-18T17:30:26Z2020-02-18T17:30:26Z2019-09-17Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2020-02-18T17:30:26Z No. of bitstreams: 1 William Ferreira Moreno Oliverio_.pdf: 1869780 bytes, checksum: 5387d0a655b8e907ca39ed4761b77fab (MD5)Made available in DSpace on 2020-02-18T17:30:26Z (GMT). No. of bitstreams: 1 William Ferreira Moreno Oliverio_.pdf: 1869780 bytes, checksum: 5387d0a655b8e907ca39ed4761b77fab (MD5) Previous issue date: 2019-09-17Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area.A fraude na área de compras é uma questão que afeta empresas de todo o mundo. Esse problema é tratado com auditorias. No entanto, devido ao grande volume de dados disponíveis, é impossível verificar todas as transações de uma empresa. Portanto, apenas uma pequena amostra dos dados é verificada. Devido ao pequeno número de fraudes em comparação com as transações padrão, frequentemente essas transações fraudulentas não são incluídas na amostra e, portanto, não são verificadas durante a auditoria. Este trabalho apresenta uma nova abordagem utilizando as técnicas de detecção de assinatura associadas ao clustering para aumentar a probabilidade de inclusão de documentos relacionados à fraude na amostra. Devido à inexistência de um banco de dados público para detecção de fraudes relacionadas à área de compras das empresas, este trabalho utiliza dados de aquisições reais para comparar a probabilidade de selecionar um documento fraudulento em uma amostra de dados. Nosso trabalho compara amostragem aleatória versus a amostragem obtida a partir do modelo proposto. Também exploramos qual seria o melhor algoritmo de clustering para esse problema específico. A metodologia proposta foi capaz de classificar os documentos de compras em diferentes clusters através da utilização do algortimo HDSCAN, no qual um deles contendo os documentos com o maior volume de sintomas associados a transações fraudulentas, de uma maneira completamente automática, algo que não foi encontrado nos papers relacionados ao tópico de fraudes na área de compras corporativas.NenhumaOliverio, William Ferreira Morenohttp://lattes.cnpq.br/7380008989239864http://lattes.cnpq.br/3914159735707328Rigo, Sandro JoséUniversidade do Vale do Rio dos SinosPrograma de Pós-Graduação em Computação AplicadaUnisinosBrasilEscola PolitécnicaA hybrid model for fraud detection on purchase orders based on unsupervised learningACCNPQ::Ciências Exatas e da Terra::Ciência da ComputaçãoFraud detectionAgrupamentoDetecção de assinaturasProcurementNon-supervised machine learningClusteringSignature detectionDetecção de fraudesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://www.repositorio.jesuita.org.br/handle/UNISINOS/9071info:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:UNISINOSORIGINALWilliam Ferreira Moreno Oliverio_.pdfWilliam Ferreira Moreno Oliverio_.pdfapplication/pdf1869780http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/1/William+Ferreira+Moreno+Oliverio_.pdf5387d0a655b8e907ca39ed4761b77fabMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82175http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/2/license.txt320e21f23402402ac4988605e1edd177MD52UNISINOS/90712020-02-18 14:32:24.837oai:www.repositorio.jesuita.org.br:UNISINOS/9071Ck5PVEE6IENPTE9RVUUgQVFVSSBBIFNVQSBQUsOTUFJJQSBMSUNFTsOHQQoKRXN0YSBsaWNlbsOnYSBkZSBleGVtcGxvIMOpIGZvcm5lY2lkYSBhcGVuYXMgcGFyYSBmaW5zIGluZm9ybWF0aXZvcy4KCkxpY2Vuw6dhIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSwgdm9jw6ogKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIMOgIApVbml2ZXJzaWRhZGUgZG8gVmFsZSBkbyBSaW8gZG9zIFNpbm9zIChVTklTSU5PUykgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdSAKZGlzdHJpYnVpciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gKGluY2x1aW5kbyBvIHJlc3VtbykgcG9yIHRvZG8gbyBtdW5kbyBubyBmb3JtYXRvIGltcHJlc3NvIGUgZWxldHLDtG5pY28gZSAKZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAKcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdSAKZGlzc2VydGHDp8OjbyBwYXJhIGZpbnMgZGUgc2VndXJhbsOnYSwgYmFjay11cCBlIHByZXNlcnZhw6fDo28uCgpWb2PDqiBkZWNsYXJhIHF1ZSBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gw6kgb3JpZ2luYWwgZSBxdWUgdm9jw6ogdGVtIG8gcG9kZXIgZGUgY29uY2VkZXIgb3MgZGlyZWl0b3MgY29udGlkb3MgCm5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IApjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6ogCmRlY2xhcmEgcXVlIG9idGV2ZSBhIHBlcm1pc3PDo28gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciDDoCBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgCm9zIGRpcmVpdG9zIGFwcmVzZW50YWRvcyBuZXN0YSBsaWNlbsOnYSwgZSBxdWUgZXNzZSBtYXRlcmlhbCBkZSBwcm9wcmllZGFkZSBkZSB0ZXJjZWlyb3MgZXN0w6EgY2xhcmFtZW50ZSAKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgCkFQT0lPIERFIFVNQSBBR8OKTkNJQSBERSBGT01FTlRPIE9VIE9VVFJPIE9SR0FOSVNNTyBRVUUgTsODTyBTRUpBIEEgU0lHTEEgREUgClVOSVZFUlNJREFERSwgVk9Dw4ogREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklTw4NPIENPTU8gClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBTaWdsYSBkZSBVbml2ZXJzaWRhZGUgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyAKY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KBiblioteca Digital de Teses e Dissertaçõeshttp://www.repositorio.jesuita.org.br/oai/requestopendoar:2020-02-18T17:32:24Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) - Universidade do Vale do Rio dos Sinos (UNISINOS)false
dc.title.pt_BR.fl_str_mv A hybrid model for fraud detection on purchase orders based on unsupervised learning
title A hybrid model for fraud detection on purchase orders based on unsupervised learning
spellingShingle A hybrid model for fraud detection on purchase orders based on unsupervised learning
Oliverio, William Ferreira Moreno
ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação
Fraud detection
Agrupamento
Detecção de assinaturas
Procurement
Non-supervised machine learning
Clustering
Signature detection
Detecção de fraudes
title_short A hybrid model for fraud detection on purchase orders based on unsupervised learning
title_full A hybrid model for fraud detection on purchase orders based on unsupervised learning
title_fullStr A hybrid model for fraud detection on purchase orders based on unsupervised learning
title_full_unstemmed A hybrid model for fraud detection on purchase orders based on unsupervised learning
title_sort A hybrid model for fraud detection on purchase orders based on unsupervised learning
author Oliverio, William Ferreira Moreno
author_facet Oliverio, William Ferreira Moreno
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/7380008989239864
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/3914159735707328
dc.contributor.author.fl_str_mv Oliverio, William Ferreira Moreno
dc.contributor.advisor1.fl_str_mv Rigo, Sandro José
contributor_str_mv Rigo, Sandro José
dc.subject.cnpq.fl_str_mv ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação
topic ACCNPQ::Ciências Exatas e da Terra::Ciência da Computação
Fraud detection
Agrupamento
Detecção de assinaturas
Procurement
Non-supervised machine learning
Clustering
Signature detection
Detecção de fraudes
dc.subject.por.fl_str_mv Fraud detection
Agrupamento
Detecção de assinaturas
dc.subject.eng.fl_str_mv Procurement
Non-supervised machine learning
Clustering
Signature detection
Detecção de fraudes
description Fraud on the purchasing area is an issue which impacts companies all around the globe. This issue is treated with audits. However, due to the massive volume of the data available, it is impossible to verify all the transactions of a company. Therefore only a small sample of the data is verified. Due to the small number of frauds compared to the standard transactions, frequently, these fraudulent transactions are not included in the sample and hence are not verified during the audit. This work presents a new approach using the techniques of signature detection associated with clustering for an increased probability of inclusion of fraud-related documents in the sample. Due to the non-existence of a public database for fraud detection related to the purchase area of companies, this work uses real procurement data to compare the probability of selecting a fraudulent document into a data sample. Our work compares random sampling versus the sampling obtained from the proposed model. We also explore what would be the best clustering algorithm for this specific problem. The proposed methodology was able to classify the purchase orders on different clusters using the HDBSCAN clustering algorithm, on which one of them grouped the POs with the most symptoms of a fraudulent transaction in a completely automated way, something which was not being found on any paper related to the topic on fraud detection on the corporate procurement area.
publishDate 2019
dc.date.issued.fl_str_mv 2019-09-17
dc.date.accessioned.fl_str_mv 2020-02-18T17:30:26Z
dc.date.available.fl_str_mv 2020-02-18T17:30:26Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071
url http://www.repositorio.jesuita.org.br/handle/UNISINOS/9071
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade do Vale do Rio dos Sinos
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Computação Aplicada
dc.publisher.initials.fl_str_mv Unisinos
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Escola Politécnica
publisher.none.fl_str_mv Universidade do Vale do Rio dos Sinos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)
instname:Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron:UNISINOS
instname_str Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron_str UNISINOS
institution UNISINOS
reponame_str Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)
collection Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos)
bitstream.url.fl_str_mv http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/1/William+Ferreira+Moreno+Oliverio_.pdf
http://repositorio.jesuita.org.br/bitstream/UNISINOS/9071/2/license.txt
bitstream.checksum.fl_str_mv 5387d0a655b8e907ca39ed4761b77fab
320e21f23402402ac4988605e1edd177
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UNISINOS (RBDU Repositório Digital da Biblioteca da Unisinos) - Universidade do Vale do Rio dos Sinos (UNISINOS)
repository.mail.fl_str_mv
_version_ 1801844937395798016