Applying optimized hierarchical NCM classification to public purchases of products in Brazil
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRN |
Texto Completo: | https://repositorio.ufrn.br/handle/123456789/48321 |
Resumo: | The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated. |
id |
UFRN_3a1aade01852f42df1a0fbe2311c4103 |
---|---|
oai_identifier_str |
oai:https://repositorio.ufrn.br:123456789/48321 |
network_acronym_str |
UFRN |
network_name_str |
Repositório Institucional da UFRN |
repository_id_str |
|
spelling |
Alves Sobrinho, Pitágoras de Azevedohttp://lattes.cnpq.br/0435510237375618http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5Oliveira, Marcel Vinicius Medeiroshttp://lattes.cnpq.br/1756952696097255Santos, Ilueny Constâncio Chaves doshttp://lattes.cnpq.br/8930351118408164Xavier Júnior, João Carlos2022-07-04T14:51:29Z2022-07-04T14:51:29Z2022-06-15ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/48321The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.Universidade Federal do Rio Grande do NorteResidência em Tecnologia da InformaçãoUFRNBrasilInstituto Metrópole DigitalSupervised classificationMachine learningHierarchical classificationNota fiscal eletrônicaProduct classificationApplying optimized hierarchical NCM classification to public purchases of products in BrazilApplying optimized hierarchical NCM classification to public purchases of products in Brazilinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisengreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNinfo:eu-repo/semantics/openAccessCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8701https://repositorio.ufrn.br/bitstream/123456789/48321/5/license_rdf42fd4ad1e89814f5e4a476b409eb708cMD55ORIGINALApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdfApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdfTCC - Finalapplication/pdf4441250https://repositorio.ufrn.br/bitstream/123456789/48321/4/ApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdff32a2c3a86664be9e66838520167e5a4MD54LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/48321/6/license.txte9597aa2854d128fd968be5edc8a28d9MD56123456789/483212023-05-02 12:32:26.112oai:https://repositorio.ufrn.br:123456789/48321Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2023-05-02T15:32:26Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
dc.title.pt_BR.fl_str_mv |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
dc.title.alternative.pt_BR.fl_str_mv |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
title |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
spellingShingle |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil Alves Sobrinho, Pitágoras de Azevedo Supervised classification Machine learning Hierarchical classification Nota fiscal eletrônica Product classification |
title_short |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
title_full |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
title_fullStr |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
title_full_unstemmed |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
title_sort |
Applying optimized hierarchical NCM classification to public purchases of products in Brazil |
author |
Alves Sobrinho, Pitágoras de Azevedo |
author_facet |
Alves Sobrinho, Pitágoras de Azevedo |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/0435510237375618 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5 |
dc.contributor.referees1.none.fl_str_mv |
Oliveira, Marcel Vinicius Medeiros |
dc.contributor.referees1Lattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/1756952696097255 |
dc.contributor.referees2.none.fl_str_mv |
Santos, Ilueny Constâncio Chaves dos |
dc.contributor.referees2Lattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/8930351118408164 |
dc.contributor.author.fl_str_mv |
Alves Sobrinho, Pitágoras de Azevedo |
dc.contributor.advisor1.fl_str_mv |
Xavier Júnior, João Carlos |
contributor_str_mv |
Xavier Júnior, João Carlos |
dc.subject.por.fl_str_mv |
Supervised classification Machine learning Hierarchical classification Nota fiscal eletrônica Product classification |
topic |
Supervised classification Machine learning Hierarchical classification Nota fiscal eletrônica Product classification |
description |
The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-07-04T14:51:29Z |
dc.date.available.fl_str_mv |
2022-07-04T14:51:29Z |
dc.date.issued.fl_str_mv |
2022-06-15 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufrn.br/handle/123456789/48321 |
identifier_str_mv |
ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022. |
url |
https://repositorio.ufrn.br/handle/123456789/48321 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte |
dc.publisher.program.fl_str_mv |
Residência em Tecnologia da Informação |
dc.publisher.initials.fl_str_mv |
UFRN |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Instituto Metrópole Digital |
publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
instacron_str |
UFRN |
institution |
UFRN |
reponame_str |
Repositório Institucional da UFRN |
collection |
Repositório Institucional da UFRN |
bitstream.url.fl_str_mv |
https://repositorio.ufrn.br/bitstream/123456789/48321/5/license_rdf https://repositorio.ufrn.br/bitstream/123456789/48321/4/ApplyingOptimizedHierarchical_AlvesSobrinho_2022.pdf https://repositorio.ufrn.br/bitstream/123456789/48321/6/license.txt |
bitstream.checksum.fl_str_mv |
42fd4ad1e89814f5e4a476b409eb708c f32a2c3a86664be9e66838520167e5a4 e9597aa2854d128fd968be5edc8a28d9 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
repository.mail.fl_str_mv |
|
_version_ |
1814832855169105920 |