Classificação das despesas com pessoal no contexto dos Tribunais de Contas

Detalhes bibliográficos
Autor(a) principal: Teixeira, Pedro Henrique
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFG
Texto Completo: http://repositorio.bc.ufg.br/tede/handle/tede/13074
Resumo: The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.
id UFG-2_5aad943ae7f74e45d5daf5b5e45bec87
oai_identifier_str oai:repositorio.bc.ufg.br:tede/13074
network_acronym_str UFG-2
network_name_str Repositório Institucional da UFG
repository_id_str
spelling Salvini, Rogerio Lopeshttp://lattes.cnpq.br/5009392667450875Silva, Nadia Félix Felipe dahttp://lattes.cnpq.br/7864834001694765Salvini, Rogerio LopesSilva, Nadia Félix Felipe daFernandes, Deborah Silva AlvesCosta, Nattane Luíza dahttp://lattes.cnpq.br/0560435807923097Teixeira, Pedro Henrique2023-10-18T16:59:24Z2023-10-18T16:59:24Z2023-08-22TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023.http://repositorio.bc.ufg.br/tede/handle/tede/13074The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.O Tribunal de Contas dos Municípios do Estado de Goiás (TCMGO) utiliza os dados de despesas recebidos mensalmente dos municípios goianos para fiscalizar os gastos com pessoal, conforme determina a LRF. No entanto, há indícios de que a classificação das despesas enviadas pelo gestor municipal pode conter incongruências oriundas de truques fiscais, contabilidade criativa ou de erros materiais, induzindo o TCMGO a tomar decisões baseadas em relatórios incorretos, e, consequentemente, resultando em graves problemas ao processo de fiscalização. Como forma de tratar esse problema, este trabalho utilizou técnicas de classificação de texto para identificar, a partir da descrição da despesa e em detrimento do código de classificação informado pelo município, a classe de uma despesa de pessoal. Para isto, foi construído um corpus com 17.116 registros de despesas rotulados por especialistas no domínio, nas abordagens binária e multi-classe. Foram aplicados procedimentos de tratamento dos dados para a extração de atributos a partir da descrição textual, assim como a atribuição de valores numéricos para cada instância do conjunto de dados com o algoritmo TF-IDF. Na etapa de modelagem foram utilizados os algoritmos Multinomial Naïve Bayes, Logistic Regression e Support Vector Machine (SVM) para classificação supervisionada. O SVM provou ser o melhor algoritmo com F-Score de 0,92 no corpus multi-classe e 0,97 no binário. Entretanto, considerando que o processo de rotulação realizado pelos especialistas humanos é complexo, demorado e caro, este trabalho desenvolveu um método para classificar a despesa de pessoal utilizando apenas 235 amostras rotuladas, somadas a outras instâncias não rotuladas, a partir da adaptação do algoritmo Self-Training produzindo resultados bastante promissores, com F-Score médio entre 0,86 e 0,89.Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2023-10-17T19:49:32Z workflow start=Step: editstep - action:claimaction No. of bitstreams: 2 Dissertação - Pedro Henrique Teixeira - 2023.pdf: 2541644 bytes, checksum: 9280b7818c27ff4567b4d0756e875e68 (MD5) license_rdf: 805 bytes, checksum: 4460e5956bc1d1639be9ae6146a50347 (MD5)Step: editstep - action:editaction Approved for entry into archive by Luciana Ferreira(lucgeral@gmail.com) on 2023-10-18T16:59:24Z (GMT)Made available in DSpace on 2023-10-18T16:59:24Z (GMT). No. of bitstreams: 2 Dissertação - Pedro Henrique Teixeira - 2023.pdf: 2541644 bytes, checksum: 9280b7818c27ff4567b4d0756e875e68 (MD5) license_rdf: 805 bytes, checksum: 4460e5956bc1d1639be9ae6146a50347 (MD5) Previous issue date: 2023-08-22porUniversidade Federal de GoiásPrograma de Pós-graduação em Ciência da Computação (INF)UFGBrasilInstituto de Informática - INF (RG)Attribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessAuditoriaDespesa públicaAprendizado de máquinaSemi-supervisionadoClassificação de textoAuditPublic expenseMachine learningSemi-supervisedText classificationCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOClassificação das despesas com pessoal no contexto dos Tribunais de ContasClassification of personnel expenses in the context of the Courts of Accountsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFGinstname:Universidade Federal de Goiás (UFG)instacron:UFGLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.bc.ufg.br/tede/bitstreams/6049f45b-3ac2-4f64-8448-b547f31fec9e/download8a4605be74aa9ea9d79846c1fba20a33MD51ORIGINALDissertação - Pedro Henrique Teixeira - 2023.pdfDissertação - Pedro Henrique Teixeira - 2023.pdfapplication/pdf2541644http://repositorio.bc.ufg.br/tede/bitstreams/24ff145b-3ebc-40b3-80c3-6c05a7ea9762/download9280b7818c27ff4567b4d0756e875e68MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805http://repositorio.bc.ufg.br/tede/bitstreams/9d17b147-4d24-4ba3-937b-03a8a148e793/download4460e5956bc1d1639be9ae6146a50347MD52tede/130742023-10-18 13:59:24.843http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accessoai:repositorio.bc.ufg.br:tede/13074http://repositorio.bc.ufg.br/tedeRepositório InstitucionalPUBhttp://repositorio.bc.ufg.br/oai/requesttasesdissertacoes.bc@ufg.bropendoar:2023-10-18T16:59:24Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)falseTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=
dc.title.none.fl_str_mv Classificação das despesas com pessoal no contexto dos Tribunais de Contas
dc.title.alternative.eng.fl_str_mv Classification of personnel expenses in the context of the Courts of Accounts
title Classificação das despesas com pessoal no contexto dos Tribunais de Contas
spellingShingle Classificação das despesas com pessoal no contexto dos Tribunais de Contas
Teixeira, Pedro Henrique
Auditoria
Despesa pública
Aprendizado de máquina
Semi-supervisionado
Classificação de texto
Audit
Public expense
Machine learning
Semi-supervised
Text classification
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Classificação das despesas com pessoal no contexto dos Tribunais de Contas
title_full Classificação das despesas com pessoal no contexto dos Tribunais de Contas
title_fullStr Classificação das despesas com pessoal no contexto dos Tribunais de Contas
title_full_unstemmed Classificação das despesas com pessoal no contexto dos Tribunais de Contas
title_sort Classificação das despesas com pessoal no contexto dos Tribunais de Contas
author Teixeira, Pedro Henrique
author_facet Teixeira, Pedro Henrique
author_role author
dc.contributor.advisor1.fl_str_mv Salvini, Rogerio Lopes
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/5009392667450875
dc.contributor.advisor-co1.fl_str_mv Silva, Nadia Félix Felipe da
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/7864834001694765
dc.contributor.referee1.fl_str_mv Salvini, Rogerio Lopes
dc.contributor.referee2.fl_str_mv Silva, Nadia Félix Felipe da
dc.contributor.referee3.fl_str_mv Fernandes, Deborah Silva Alves
dc.contributor.referee4.fl_str_mv Costa, Nattane Luíza da
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/0560435807923097
dc.contributor.author.fl_str_mv Teixeira, Pedro Henrique
contributor_str_mv Salvini, Rogerio Lopes
Silva, Nadia Félix Felipe da
Salvini, Rogerio Lopes
Silva, Nadia Félix Felipe da
Fernandes, Deborah Silva Alves
Costa, Nattane Luíza da
dc.subject.por.fl_str_mv Auditoria
Despesa pública
Aprendizado de máquina
Semi-supervisionado
Classificação de texto
topic Auditoria
Despesa pública
Aprendizado de máquina
Semi-supervisionado
Classificação de texto
Audit
Public expense
Machine learning
Semi-supervised
Text classification
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Audit
Public expense
Machine learning
Semi-supervised
Text classification
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.
publishDate 2023
dc.date.accessioned.fl_str_mv 2023-10-18T16:59:24Z
dc.date.available.fl_str_mv 2023-10-18T16:59:24Z
dc.date.issued.fl_str_mv 2023-08-22
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023.
dc.identifier.uri.fl_str_mv http://repositorio.bc.ufg.br/tede/handle/tede/13074
identifier_str_mv TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023.
url http://repositorio.bc.ufg.br/tede/handle/tede/13074
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Goiás
dc.publisher.program.fl_str_mv Programa de Pós-graduação em Ciência da Computação (INF)
dc.publisher.initials.fl_str_mv UFG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Instituto de Informática - INF (RG)
publisher.none.fl_str_mv Universidade Federal de Goiás
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFG
instname:Universidade Federal de Goiás (UFG)
instacron:UFG
instname_str Universidade Federal de Goiás (UFG)
instacron_str UFG
institution UFG
reponame_str Repositório Institucional da UFG
collection Repositório Institucional da UFG
bitstream.url.fl_str_mv http://repositorio.bc.ufg.br/tede/bitstreams/6049f45b-3ac2-4f64-8448-b547f31fec9e/download
http://repositorio.bc.ufg.br/tede/bitstreams/24ff145b-3ebc-40b3-80c3-6c05a7ea9762/download
http://repositorio.bc.ufg.br/tede/bitstreams/9d17b147-4d24-4ba3-937b-03a8a148e793/download
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
9280b7818c27ff4567b4d0756e875e68
4460e5956bc1d1639be9ae6146a50347
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)
repository.mail.fl_str_mv tasesdissertacoes.bc@ufg.br
_version_ 1798044377353814016