A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies

Detalhes bibliográficos
Autor(a) principal: Nunes, Rafael Oleques
Data de Publicação: 2023
Tipo de documento: Trabalho de conclusão de curso
Idioma: eng
Título da fonte: Repositório Institucional da UFRGS
Texto Completo: http://hdl.handle.net/10183/267612
Resumo: A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.
id UFRGS-2_d066eb50b7881ced0e1c7e5c1fb8332d
oai_identifier_str oai:www.lume.ufrgs.br:10183/267612
network_acronym_str UFRGS-2
network_name_str Repositório Institucional da UFRGS
repository_id_str
spelling Nunes, Rafael OlequesFreitas, Carla Maria Dal SassoBalreira, Dennis Giovani2023-11-25T03:26:08Z2023http://hdl.handle.net/10183/267612001188065A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.O ambiente político-legal geralmente envolve diversos documentos e etapas relacionadas a leis e seu trajeto de processamento. Devido a esse grande volume de dados, uma quantidade considerável de informações essenciais, como classificação de tema, palavras-chave e ementa, frequentemente está ausente. Esse problema aumenta o hiato entre os cidadãos e a política, impactando negativamente a sociedade. Considerando a Câmara dos Deputados do Brasil de 1991 a 2022, cerca de 75% das proposições não contêm classificação de tema em seus metadados associados. No entanto, devido a muitas proposições no corpus, esse cenário é adequado para abordagens de aprendizado de máquina e processamento de linguagem natural. Este trabalho propõe um novo método para estimar temas nas proposições da Câmara dos Deputados do Brasil. Nossa solução apresenta e compara dois modelos BERT adaptados para a língua portuguesa usando as informações de ementa, que se referem a uma breve descrição ou visão geral dos principais pontos de um documento político, como um projeto de lei ou uma proposta. Obtivemos nossos melhores resultados usando a variação do modelo BERTimbau, alcançando 78,94% de pontuação F1 weighted e 72,78% de pontuação F1 macro. Até onde sabemos, este é o primeiro trabalho a propor um modelo para prever temas de proposições na Câmara dos Deputados do Brasil. Nossa abordagem aumenta a classificação dos temas das proposições e incentiva os pesquisadores a explorar técnicas semelhantes para outros documentos legais. Nossas descobertas auxiliam os pesquisadores em ciência política a elaborar análises de dados mais robustas, o que não era possível com os dados anteriores, impactando diretamente a sociedade.application/pdfengMineração de textoAnálise de dadosInteligência artificialMulti-label classificationLegislative documents classificationLanguage modelsA classification approach for estimating subjects of bills in the Brazilian Chamber of DeputiesUma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2023Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001188065.pdf.txt001188065.pdf.txtExtracted Texttext/plain134508http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt67b779ea0be6efb691d08a5730e6e401MD52ORIGINAL001188065.pdfTexto completo (inglês)application/pdf621151http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdfce759b422d34c1b269c574f2ab326dbfMD5110183/2676122023-11-26 04:25:49.84986oai:www.lume.ufrgs.br:10183/267612Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-11-26T06:25:49Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
dc.title.alternative.pt.fl_str_mv Uma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil
title A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
spellingShingle A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
Nunes, Rafael Oleques
Mineração de texto
Análise de dados
Inteligência artificial
Multi-label classification
Legislative documents classification
Language models
title_short A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_full A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_fullStr A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_full_unstemmed A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_sort A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
author Nunes, Rafael Oleques
author_facet Nunes, Rafael Oleques
author_role author
dc.contributor.author.fl_str_mv Nunes, Rafael Oleques
dc.contributor.advisor1.fl_str_mv Freitas, Carla Maria Dal Sasso
dc.contributor.advisor-co1.fl_str_mv Balreira, Dennis Giovani
contributor_str_mv Freitas, Carla Maria Dal Sasso
Balreira, Dennis Giovani
dc.subject.por.fl_str_mv Mineração de texto
Análise de dados
Inteligência artificial
topic Mineração de texto
Análise de dados
Inteligência artificial
Multi-label classification
Legislative documents classification
Language models
dc.subject.eng.fl_str_mv Multi-label classification
Legislative documents classification
Language models
description A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.
publishDate 2023
dc.date.accessioned.fl_str_mv 2023-11-25T03:26:08Z
dc.date.issued.fl_str_mv 2023
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
format bachelorThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10183/267612
dc.identifier.nrb.pt_BR.fl_str_mv 001188065
url http://hdl.handle.net/10183/267612
identifier_str_mv 001188065
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRGS
instname:Universidade Federal do Rio Grande do Sul (UFRGS)
instacron:UFRGS
instname_str Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str UFRGS
institution UFRGS
reponame_str Repositório Institucional da UFRGS
collection Repositório Institucional da UFRGS
bitstream.url.fl_str_mv http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt
http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdf
bitstream.checksum.fl_str_mv 67b779ea0be6efb691d08a5730e6e401
ce759b422d34c1b269c574f2ab326dbf
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_ 1801224670874173440