Classificação das despesas com pessoal no contexto dos Tribunais de Contas
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFG |
Texto Completo: | http://repositorio.bc.ufg.br/tede/handle/tede/13074 |
Resumo: | The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89. |
id |
UFG-2_5aad943ae7f74e45d5daf5b5e45bec87 |
---|---|
oai_identifier_str |
oai:repositorio.bc.ufg.br:tede/13074 |
network_acronym_str |
UFG-2 |
network_name_str |
Repositório Institucional da UFG |
repository_id_str |
|
spelling |
Salvini, Rogerio Lopeshttp://lattes.cnpq.br/5009392667450875Silva, Nadia Félix Felipe dahttp://lattes.cnpq.br/7864834001694765Salvini, Rogerio LopesSilva, Nadia Félix Felipe daFernandes, Deborah Silva AlvesCosta, Nattane Luíza dahttp://lattes.cnpq.br/0560435807923097Teixeira, Pedro Henrique2023-10-18T16:59:24Z2023-10-18T16:59:24Z2023-08-22TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023.http://repositorio.bc.ufg.br/tede/handle/tede/13074The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89.O Tribunal de Contas dos Municípios do Estado de Goiás (TCMGO) utiliza os dados de despesas recebidos mensalmente dos municípios goianos para fiscalizar os gastos com pessoal, conforme determina a LRF. No entanto, há indícios de que a classificação das despesas enviadas pelo gestor municipal pode conter incongruências oriundas de truques fiscais, contabilidade criativa ou de erros materiais, induzindo o TCMGO a tomar decisões baseadas em relatórios incorretos, e, consequentemente, resultando em graves problemas ao processo de fiscalização. Como forma de tratar esse problema, este trabalho utilizou técnicas de classificação de texto para identificar, a partir da descrição da despesa e em detrimento do código de classificação informado pelo município, a classe de uma despesa de pessoal. Para isto, foi construído um corpus com 17.116 registros de despesas rotulados por especialistas no domínio, nas abordagens binária e multi-classe. Foram aplicados procedimentos de tratamento dos dados para a extração de atributos a partir da descrição textual, assim como a atribuição de valores numéricos para cada instância do conjunto de dados com o algoritmo TF-IDF. Na etapa de modelagem foram utilizados os algoritmos Multinomial Naïve Bayes, Logistic Regression e Support Vector Machine (SVM) para classificação supervisionada. O SVM provou ser o melhor algoritmo com F-Score de 0,92 no corpus multi-classe e 0,97 no binário. Entretanto, considerando que o processo de rotulação realizado pelos especialistas humanos é complexo, demorado e caro, este trabalho desenvolveu um método para classificar a despesa de pessoal utilizando apenas 235 amostras rotuladas, somadas a outras instâncias não rotuladas, a partir da adaptação do algoritmo Self-Training produzindo resultados bastante promissores, com F-Score médio entre 0,86 e 0,89.Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2023-10-17T19:49:32Z workflow start=Step: editstep - action:claimaction No. of bitstreams: 2 Dissertação - Pedro Henrique Teixeira - 2023.pdf: 2541644 bytes, checksum: 9280b7818c27ff4567b4d0756e875e68 (MD5) license_rdf: 805 bytes, checksum: 4460e5956bc1d1639be9ae6146a50347 (MD5)Step: editstep - action:editaction Approved for entry into archive by Luciana Ferreira(lucgeral@gmail.com) on 2023-10-18T16:59:24Z (GMT)Made available in DSpace on 2023-10-18T16:59:24Z (GMT). No. of bitstreams: 2 Dissertação - Pedro Henrique Teixeira - 2023.pdf: 2541644 bytes, checksum: 9280b7818c27ff4567b4d0756e875e68 (MD5) license_rdf: 805 bytes, checksum: 4460e5956bc1d1639be9ae6146a50347 (MD5) Previous issue date: 2023-08-22porUniversidade Federal de GoiásPrograma de Pós-graduação em Ciência da Computação (INF)UFGBrasilInstituto de Informática - INF (RG)Attribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessAuditoriaDespesa públicaAprendizado de máquinaSemi-supervisionadoClassificação de textoAuditPublic expenseMachine learningSemi-supervisedText classificationCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOClassificação das despesas com pessoal no contexto dos Tribunais de ContasClassification of personnel expenses in the context of the Courts of Accountsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFGinstname:Universidade Federal de Goiás (UFG)instacron:UFGLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.bc.ufg.br/tede/bitstreams/6049f45b-3ac2-4f64-8448-b547f31fec9e/download8a4605be74aa9ea9d79846c1fba20a33MD51ORIGINALDissertação - Pedro Henrique Teixeira - 2023.pdfDissertação - Pedro Henrique Teixeira - 2023.pdfapplication/pdf2541644http://repositorio.bc.ufg.br/tede/bitstreams/24ff145b-3ebc-40b3-80c3-6c05a7ea9762/download9280b7818c27ff4567b4d0756e875e68MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805http://repositorio.bc.ufg.br/tede/bitstreams/9d17b147-4d24-4ba3-937b-03a8a148e793/download4460e5956bc1d1639be9ae6146a50347MD52tede/130742023-10-18 13:59:24.843http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accessoai:repositorio.bc.ufg.br:tede/13074http://repositorio.bc.ufg.br/tedeRepositório InstitucionalPUBhttp://repositorio.bc.ufg.br/oai/requesttasesdissertacoes.bc@ufg.bropendoar:2023-10-18T16:59:24Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)falseTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |
dc.title.none.fl_str_mv |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
dc.title.alternative.eng.fl_str_mv |
Classification of personnel expenses in the context of the Courts of Accounts |
title |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
spellingShingle |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas Teixeira, Pedro Henrique Auditoria Despesa pública Aprendizado de máquina Semi-supervisionado Classificação de texto Audit Public expense Machine learning Semi-supervised Text classification CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
title_full |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
title_fullStr |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
title_full_unstemmed |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
title_sort |
Classificação das despesas com pessoal no contexto dos Tribunais de Contas |
author |
Teixeira, Pedro Henrique |
author_facet |
Teixeira, Pedro Henrique |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Salvini, Rogerio Lopes |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/5009392667450875 |
dc.contributor.advisor-co1.fl_str_mv |
Silva, Nadia Félix Felipe da |
dc.contributor.advisor-co1Lattes.fl_str_mv |
http://lattes.cnpq.br/7864834001694765 |
dc.contributor.referee1.fl_str_mv |
Salvini, Rogerio Lopes |
dc.contributor.referee2.fl_str_mv |
Silva, Nadia Félix Felipe da |
dc.contributor.referee3.fl_str_mv |
Fernandes, Deborah Silva Alves |
dc.contributor.referee4.fl_str_mv |
Costa, Nattane Luíza da |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/0560435807923097 |
dc.contributor.author.fl_str_mv |
Teixeira, Pedro Henrique |
contributor_str_mv |
Salvini, Rogerio Lopes Silva, Nadia Félix Felipe da Salvini, Rogerio Lopes Silva, Nadia Félix Felipe da Fernandes, Deborah Silva Alves Costa, Nattane Luíza da |
dc.subject.por.fl_str_mv |
Auditoria Despesa pública Aprendizado de máquina Semi-supervisionado Classificação de texto |
topic |
Auditoria Despesa pública Aprendizado de máquina Semi-supervisionado Classificação de texto Audit Public expense Machine learning Semi-supervised Text classification CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
dc.subject.eng.fl_str_mv |
Audit Public expense Machine learning Semi-supervised Text classification |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
The Court of Accounts of the Municipalities of the State of Goiás (TCMGO) uses the expenditure data received monthly from the municipalities of Goiás to check the expenditure related to personnel expenses, as determined by LRF. However, there are indications that the classification of expenses sent by the municipal manager may contain inconsistencies arising from fiscal tricks, creative accounting or material errors, leading TCMGO to make decisions based on incorrect reports, resulting in serious consequences for the inspection process. As a way of dealing with this problem, this work used text classification techniques to identify, based on the description of the expense and instead of the code provided by the municipality, the class of a personnel expense. For this, a corpus was built with 17,116 expense records labeled by domain experts, using binary and multi-class approaches. Data processing procedures were applied to extract attributes from the textual description, as well as assign numerical values to each instance of the data set with the TF-IDF algorithm. In the modeling stage, the algorithms Multinomial Naïve Bayes, Logistic Regression and Support Vector Machine (SVM) were used in supervised classification. SVM proved to be the best algorithm, with F-Score of 0.92 and 0.97, respectively, on the multi-class and binary corpus. However, it was found that the labeling process carried out by human experts is complex, time-consuming and expensive. Therefore, this work developed a method to classify personnel expenses using only 235 labeled samples, improved by unlabeled instances, based on the adaptation of the Self-Training algorithm, producing very promising results, with an average F-Score between 0.86 and 0.89. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-10-18T16:59:24Z |
dc.date.available.fl_str_mv |
2023-10-18T16:59:24Z |
dc.date.issued.fl_str_mv |
2023-08-22 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023. |
dc.identifier.uri.fl_str_mv |
http://repositorio.bc.ufg.br/tede/handle/tede/13074 |
identifier_str_mv |
TEIXEIRA, P. H. Classificação das despesas com pessoal no contexto dos Tribunais de Contas. 2023. 111 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023. |
url |
http://repositorio.bc.ufg.br/tede/handle/tede/13074 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Goiás |
dc.publisher.program.fl_str_mv |
Programa de Pós-graduação em Ciência da Computação (INF) |
dc.publisher.initials.fl_str_mv |
UFG |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Instituto de Informática - INF (RG) |
publisher.none.fl_str_mv |
Universidade Federal de Goiás |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFG instname:Universidade Federal de Goiás (UFG) instacron:UFG |
instname_str |
Universidade Federal de Goiás (UFG) |
instacron_str |
UFG |
institution |
UFG |
reponame_str |
Repositório Institucional da UFG |
collection |
Repositório Institucional da UFG |
bitstream.url.fl_str_mv |
http://repositorio.bc.ufg.br/tede/bitstreams/6049f45b-3ac2-4f64-8448-b547f31fec9e/download http://repositorio.bc.ufg.br/tede/bitstreams/24ff145b-3ebc-40b3-80c3-6c05a7ea9762/download http://repositorio.bc.ufg.br/tede/bitstreams/9d17b147-4d24-4ba3-937b-03a8a148e793/download |
bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 9280b7818c27ff4567b4d0756e875e68 4460e5956bc1d1639be9ae6146a50347 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFG - Universidade Federal de Goiás (UFG) |
repository.mail.fl_str_mv |
tasesdissertacoes.bc@ufg.br |
_version_ |
1798044377353814016 |