Applying optimized hierarchical NCM classification to public purchases of products in Brazil

Detalhes bibliográficos
Autor(a) principal: Alves Sobrinho, Pitágoras de Azevedo
Data de Publicação: 2022
Tipo de documento: Trabalho de conclusão de curso
Idioma: eng
Título da fonte: Repositório Institucional da UFRN
Texto Completo: https://repositorio.ufrn.br/handle/123456789/48321
Resumo: The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.
id UFRN_3a1aade01852f42df1a0fbe2311c4103
oai_identifier_str oai:https://repositorio.ufrn.br:123456789/48321
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Alves Sobrinho, Pitágoras de Azevedohttp://lattes.cnpq.br/0435510237375618http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5Oliveira, Marcel Vinicius Medeiroshttp://lattes.cnpq.br/1756952696097255Santos, Ilueny Constâncio Chaves doshttp://lattes.cnpq.br/8930351118408164Xavier Júnior, João Carlos2022-07-04T14:51:29Z2022-07-04T14:51:29Z2022-06-15ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/48321The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.Universidade Federal do Rio Grande do NorteResidência em Tecnologia da InformaçãoUFRNBrasilInstituto Metrópole DigitalSupervised classificationMachine learningHierarchical classificationNota fiscal eletrônicaProduct classificationApplying optimized hierarchical NCM classification to public purchases of products in BrazilApplying optimized hierarchical NCM classification to public purchases of products in Brazilinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisengreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNinfo:eu-repo/semantics/openAccessCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8701https://repositorio.ufrn.br/bitstream/123456789/48321/5/license_rdf42fd4ad1e89814f5e4a476b409eb708cMD55ORIGINALApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdfApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdfTCC - Finalapplication/pdf4441250https://repositorio.ufrn.br/bitstream/123456789/48321/4/ApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdff32a2c3a86664be9e66838520167e5a4MD54LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/48321/6/license.txte9597aa2854d128fd968be5edc8a28d9MD56123456789/483212023-05-02 12:32:26.112oai:https://repositorio.ufrn.br:123456789/48321Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2023-05-02T15:32:26Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.pt_BR.fl_str_mv Applying optimized hierarchical NCM classification to public purchases of products in Brazil
dc.title.alternative.pt_BR.fl_str_mv Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title Applying optimized hierarchical NCM classification to public purchases of products in Brazil
spellingShingle Applying optimized hierarchical NCM classification to public purchases of products in Brazil
Alves Sobrinho, Pitágoras de Azevedo
Supervised classification
Machine learning
Hierarchical classification
Nota fiscal eletrônica
Product classification
title_short Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_full Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_fullStr Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_full_unstemmed Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_sort Applying optimized hierarchical NCM classification to public purchases of products in Brazil
author Alves Sobrinho, Pitágoras de Azevedo
author_facet Alves Sobrinho, Pitágoras de Azevedo
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/0435510237375618
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5
dc.contributor.referees1.none.fl_str_mv Oliveira, Marcel Vinicius Medeiros
dc.contributor.referees1Lattes.pt_BR.fl_str_mv http://lattes.cnpq.br/1756952696097255
dc.contributor.referees2.none.fl_str_mv Santos, Ilueny Constâncio Chaves dos
dc.contributor.referees2Lattes.pt_BR.fl_str_mv http://lattes.cnpq.br/8930351118408164
dc.contributor.author.fl_str_mv Alves Sobrinho, Pitágoras de Azevedo
dc.contributor.advisor1.fl_str_mv Xavier Júnior, João Carlos
contributor_str_mv Xavier Júnior, João Carlos
dc.subject.por.fl_str_mv Supervised classification
Machine learning
Hierarchical classification
Nota fiscal eletrônica
Product classification
topic Supervised classification
Machine learning
Hierarchical classification
Nota fiscal eletrônica
Product classification
description The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-07-04T14:51:29Z
dc.date.available.fl_str_mv 2022-07-04T14:51:29Z
dc.date.issued.fl_str_mv 2022-06-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
format bachelorThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.
dc.identifier.uri.fl_str_mv https://repositorio.ufrn.br/handle/123456789/48321
identifier_str_mv ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.
url https://repositorio.ufrn.br/handle/123456789/48321
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
dc.publisher.program.fl_str_mv Residência em Tecnologia da Informação
dc.publisher.initials.fl_str_mv UFRN
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Instituto Metrópole Digital
publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
bitstream.url.fl_str_mv https://repositorio.ufrn.br/bitstream/123456789/48321/5/license_rdf
https://repositorio.ufrn.br/bitstream/123456789/48321/4/ApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdf
https://repositorio.ufrn.br/bitstream/123456789/48321/6/license.txt
bitstream.checksum.fl_str_mv 42fd4ad1e89814f5e4a476b409eb708c
f32a2c3a86664be9e66838520167e5a4
e9597aa2854d128fd968be5edc8a28d9
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv
_version_ 1814832855169105920