A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies

Nunes, Rafael Oleques

A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies

Detalhes bibliográficos
Autor(a) principal:	Nunes, Rafael Oleques
Data de Publicação:	2023
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRGS
Texto Completo:	http://hdl.handle.net/10183/267612
Resumo:	A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.

Metadados do item

id	UFRGS-2_d066eb50b7881ced0e1c7e5c1fb8332d
oai_identifier_str	oai:www.lume.ufrgs.br:10183/267612
network_acronym_str	UFRGS-2
network_name_str	Repositório Institucional da UFRGS
repository_id_str
spelling	Nunes, Rafael OlequesFreitas, Carla Maria Dal SassoBalreira, Dennis Giovani2023-11-25T03:26:08Z2023http://hdl.handle.net/10183/267612001188065A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.O ambiente político-legal geralmente envolve diversos documentos e etapas relacionadas a leis e seu trajeto de processamento. Devido a esse grande volume de dados, uma quantidade considerável de informações essenciais, como classificação de tema, palavras-chave e ementa, frequentemente está ausente. Esse problema aumenta o hiato entre os cidadãos e a política, impactando negativamente a sociedade. Considerando a Câmara dos Deputados do Brasil de 1991 a 2022, cerca de 75% das proposições não contêm classificação de tema em seus metadados associados. No entanto, devido a muitas proposições no corpus, esse cenário é adequado para abordagens de aprendizado de máquina e processamento de linguagem natural. Este trabalho propõe um novo método para estimar temas nas proposições da Câmara dos Deputados do Brasil. Nossa solução apresenta e compara dois modelos BERT adaptados para a língua portuguesa usando as informações de ementa, que se referem a uma breve descrição ou visão geral dos principais pontos de um documento político, como um projeto de lei ou uma proposta. Obtivemos nossos melhores resultados usando a variação do modelo BERTimbau, alcançando 78,94% de pontuação F1 weighted e 72,78% de pontuação F1 macro. Até onde sabemos, este é o primeiro trabalho a propor um modelo para prever temas de proposições na Câmara dos Deputados do Brasil. Nossa abordagem aumenta a classificação dos temas das proposições e incentiva os pesquisadores a explorar técnicas semelhantes para outros documentos legais. Nossas descobertas auxiliam os pesquisadores em ciência política a elaborar análises de dados mais robustas, o que não era possível com os dados anteriores, impactando diretamente a sociedade.application/pdfengMineração de textoAnálise de dadosInteligência artificialMulti-label classificationLegislative documents classificationLanguage modelsA classification approach for estimating subjects of bills in the Brazilian Chamber of DeputiesUma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2023Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001188065.pdf.txt001188065.pdf.txtExtracted Texttext/plain134508http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt67b779ea0be6efb691d08a5730e6e401MD52ORIGINAL001188065.pdfTexto completo (inglês)application/pdf621151http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdfce759b422d34c1b269c574f2ab326dbfMD5110183/2676122023-11-26 04:25:49.84986oai:www.lume.ufrgs.br:10183/267612Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-11-26T06:25:49Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
dc.title.alternative.pt.fl_str_mv	Uma abordagem de classificação para estimar temas de proposições na Câmara dos Deputados do Brasil
title	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
spellingShingle	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies Nunes, Rafael Oleques Mineração de texto Análise de dados Inteligência artificial Multi-label classification Legislative documents classification Language models
title_short	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_full	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_fullStr	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_full_unstemmed	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
title_sort	A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies
author	Nunes, Rafael Oleques
author_facet	Nunes, Rafael Oleques
author_role	author
dc.contributor.author.fl_str_mv	Nunes, Rafael Oleques
dc.contributor.advisor1.fl_str_mv	Freitas, Carla Maria Dal Sasso
dc.contributor.advisor-co1.fl_str_mv	Balreira, Dennis Giovani
contributor_str_mv	Freitas, Carla Maria Dal Sasso Balreira, Dennis Giovani
dc.subject.por.fl_str_mv	Mineração de texto Análise de dados Inteligência artificial
topic	Mineração de texto Análise de dados Inteligência artificial Multi-label classification Legislative documents classification Language models
dc.subject.eng.fl_str_mv	Multi-label classification Legislative documents classification Language models
description	A political-legal environment usually involves many documents and stages regarding laws and their processing route. Due to this large volume of data, a considerable amount of essential data, such as subject classification, keywords, and summary, is often missing for bills that are proposed. This issue increases the gap between citizens and politics, negatively affecting society. Considering the Brazilian Chamber of Deputies from 1991 to 2022, around 75% of the bills do not have subject classification included in their associated metadata. However, thanks to many bills in the corpus, this scenario suits machine learning and natural language processing approaches. This study proposes a new method for estimating subjects for the Brazilian Chamber of Deputies’ bills. Our solution presents and compares two BERT models adapted for the Portuguese language using the summary information, referring to a brief description or overview of the main points of a political document. We obtained our best results using the BERTimbau model variation, achieving 78.94% of the weighted F1 score and 72.78% of the macro F1 score. To the best of our knowledge, this is the first work to propose a model for predicting the subjects of the Brazilian Chamber of Deputies’ bills. Our approach encourages researchers to explore similar techniques for other legal documents. Our findings help political scientists perform a more robust data analysis, which was not possible with the previous data, directly impacting society.
publishDate	2023
dc.date.accessioned.fl_str_mv	2023-11-25T03:26:08Z
dc.date.issued.fl_str_mv	2023
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/267612
dc.identifier.nrb.pt_BR.fl_str_mv	001188065
url	http://hdl.handle.net/10183/267612
identifier_str_mv	001188065
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Repositório Institucional da UFRGS
collection	Repositório Institucional da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/267612/2/001188065.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/267612/1/001188065.pdf
bitstream.checksum.fl_str_mv	67b779ea0be6efb691d08a5730e6e401 ce759b422d34c1b269c574f2ab326dbf
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_	1801224670874173440

A classification approach for estimating subjects of bills in the Brazilian Chamber of Deputies

Registros relacionados