A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/267612 |
Resumo: | A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society. |
id |
UFRGS-2_d066eb50b7881ced0e1c7e5c1fb8332d |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/267612 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Nunes, Rafael OlequesFreitas, Carla Maria Dal SassoBalreira, Dennis Giovani2023-11-25T03:26:08Z2023http://hdl.handle.net/10183/267612001188065A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.O ambiente político-legal geralmente envolve diversos documentos e etapas relacionadas a leis e seu trajeto de processamento. Devido a esse grande volume de dados, uma quantidade considerável de informações essenciais, como classificação de tema, palavras-chave e ementa, frequentemente está ausente. Esse problema aumenta o hiato entre os cidadãos e a política, impactando negativamente a sociedade. Considerando a Câmara dos Deputados do Brasil de 1991 a 2022, cerca de 75% das proposições não contêm classificação de tema em seus metadados associados. No entanto, devido a muitas proposições no corpus, esse cenário é adequado para abordagens de aprendizado de máquina e processamento de linguagem natural. Este trabalho propõe um novo método para estimar temas nas proposições da Câmara dos Deputados do Brasil. Nossa solução apresenta e compara dois modelos BERT adaptados para a língua portuguesa usando as informações de ementa, que se referem a uma breve descrição ou visão geral dos principais pontos de um documento político, como um projeto de lei ou uma proposta. Obtivemos nossos melhores resultados usando a variação do modelo BERTimbau, alcançando 78,94% de pontuação F1 weighted e 72,78% de pontuação F1 macro. Até onde sabemos, este é o primeiro trabalho a propor um modelo para prever temas de proposições na Câmara dos Deputados do Brasil. Nossa abordagem aumenta a classificação dos temas das proposições e incentiva os pesquisadores a explorar técnicas semelhantes para outros documentos legais. Nossas descobertas auxiliam os pesquisadores em ciência política a elaborar análises de dados mais robustas, o que não era possível com os dados anteriores, impactando diretamente a sociedade.application/pdfengMineração de textoAnálise de dadosInteligência artificialMulti-label classificationLegislative documents classificationLanguage modelsA classification approach for estimating subjects of bills in the Brazilian Chamber of DeputiesUma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2023Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001188065.pdf.txt001188065.pdf.txtExtracted Texttext/plain134508http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt67b779ea0be6efb691d08a5730e6e401MD52ORIGINAL001188065.pdfTexto completo (inglês)application/pdf621151http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdfce759b422d34c1b269c574f2ab326dbfMD5110183/2676122023-11-26 04:25:49.84986oai:www.lume.ufrgs.br:10183/267612Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-11-26T06:25:49Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
dc.title.alternative.pt.fl_str_mv |
Uma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil |
title |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
spellingShingle |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies Nunes, Rafael Oleques Mineração de texto Análise de dados Inteligência artificial Multi-label classification Legislative documents classification Language models |
title_short |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
title_full |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
title_fullStr |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
title_full_unstemmed |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
title_sort |
A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies |
author |
Nunes, Rafael Oleques |
author_facet |
Nunes, Rafael Oleques |
author_role |
author |
dc.contributor.author.fl_str_mv |
Nunes, Rafael Oleques |
dc.contributor.advisor1.fl_str_mv |
Freitas, Carla Maria Dal Sasso |
dc.contributor.advisor-co1.fl_str_mv |
Balreira, Dennis Giovani |
contributor_str_mv |
Freitas, Carla Maria Dal Sasso Balreira, Dennis Giovani |
dc.subject.por.fl_str_mv |
Mineração de texto Análise de dados Inteligência artificial |
topic |
Mineração de texto Análise de dados Inteligência artificial Multi-label classification Legislative documents classification Language models |
dc.subject.eng.fl_str_mv |
Multi-label classification Legislative documents classification Language models |
description |
A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-11-25T03:26:08Z |
dc.date.issued.fl_str_mv |
2023 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/267612 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001188065 |
url |
http://hdl.handle.net/10183/267612 |
identifier_str_mv |
001188065 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdf |
bitstream.checksum.fl_str_mv |
67b779ea0be6efb691d08a5730e6e401 ce759b422d34c1b269c574f2ab326dbf |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1801224670874173440 |