Using feature engineering to improve the identification of bias in legal judgment database

Detalhes bibliográficos
Autor(a) principal: Carvalho, Daniel Marx Pinto
Data de Publicação: 2022
Tipo de documento: Trabalho de conclusão de curso
Idioma: eng
Título da fonte: Repositório Institucional da UFRN
Texto Completo: https://repositorio.ufrn.br/handle/123456789/48478
Resumo: Machine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base.
id UFRN_b2e5343b27fce8133a0940900a9c7b15
oai_identifier_str oai:https://repositorio.ufrn.br:123456789/48478
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Carvalho, Daniel Marx Pintohttp://lattes.cnpq.br/36256513945046320000-0001-7461-7570http://lattes.cnpq.br/2234040548103596Pereira, Mônica Magalhãeshttp://lattes.cnpq.br/5777010848661813Costa, José Alfredo Ferreirahttp://lattes.cnpq.br/9745845064013172Abreu, Marjory Cristiany Da Costa2022-07-13T09:56:30Z2022-07-13T09:56:30Z2022-07-11CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/48478Universidade Federal do Rio Grande do NorteCiência da ComputaçãoUFRNBrasilInformática e Matemática AplicadaAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessMachine LearningFeature SelectionUndersamplingPrediction of bias in court decisionsUsing feature engineering to improve the identification of bias in legal judgment databaseinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisMachine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNORIGINALUsingFeatureEngineering_Carvalho_2022.pdfUsingFeatureEngineering_Carvalho_2022.pdfapplication/pdf574888https://repositorio.ufrn.br/bitstream/123456789/48478/1/UsingFeatureEngineering_Carvalho_2022.pdfbda01f57dae6032ac75faef56ce7412eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufrn.br/bitstream/123456789/48478/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/48478/3/license.txte9597aa2854d128fd968be5edc8a28d9MD53123456789/484782022-08-19 09:12:25.256oai:https://repositorio.ufrn.br:123456789/48478Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2022-08-19T12:12:25Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.pt_BR.fl_str_mv Using feature engineering to improve the identification of bias in legal judgment database
title Using feature engineering to improve the identification of bias in legal judgment database
spellingShingle Using feature engineering to improve the identification of bias in legal judgment database
Carvalho, Daniel Marx Pinto
Machine Learning
Feature Selection
Undersampling
Prediction of bias in court decisions
title_short Using feature engineering to improve the identification of bias in legal judgment database
title_full Using feature engineering to improve the identification of bias in legal judgment database
title_fullStr Using feature engineering to improve the identification of bias in legal judgment database
title_full_unstemmed Using feature engineering to improve the identification of bias in legal judgment database
title_sort Using feature engineering to improve the identification of bias in legal judgment database
author Carvalho, Daniel Marx Pinto
author_facet Carvalho, Daniel Marx Pinto
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/3625651394504632
dc.contributor.advisorID.pt_BR.fl_str_mv 0000-0001-7461-7570
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/2234040548103596
dc.contributor.referees1.none.fl_str_mv Costa, José Alfredo Ferreira
dc.contributor.referees1Lattes.pt_BR.fl_str_mv http://lattes.cnpq.br/9745845064013172
dc.contributor.author.fl_str_mv Carvalho, Daniel Marx Pinto
dc.contributor.advisor-co1.fl_str_mv Pereira, Mônica Magalhães
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/5777010848661813
dc.contributor.advisor1.fl_str_mv Abreu, Marjory Cristiany Da Costa
contributor_str_mv Pereira, Mônica Magalhães
Abreu, Marjory Cristiany Da Costa
dc.subject.por.fl_str_mv Machine Learning
Feature Selection
Undersampling
Prediction of bias in court decisions
topic Machine Learning
Feature Selection
Undersampling
Prediction of bias in court decisions
description Machine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-07-13T09:56:30Z
dc.date.available.fl_str_mv 2022-07-13T09:56:30Z
dc.date.issued.fl_str_mv 2022-07-11
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
format bachelorThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022.
dc.identifier.uri.fl_str_mv https://repositorio.ufrn.br/handle/123456789/48478
identifier_str_mv CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022.
url https://repositorio.ufrn.br/handle/123456789/48478
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
dc.publisher.program.fl_str_mv Ciência da Computação
dc.publisher.initials.fl_str_mv UFRN
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Informática e Matemática Aplicada
publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
bitstream.url.fl_str_mv https://repositorio.ufrn.br/bitstream/123456789/48478/1/UsingFeatureEngineering_Carvalho_2022.pdf
https://repositorio.ufrn.br/bitstream/123456789/48478/2/license_rdf
https://repositorio.ufrn.br/bitstream/123456789/48478/3/license.txt
bitstream.checksum.fl_str_mv bda01f57dae6032ac75faef56ce7412e
e39d27027a6cc9cb039ad269a5db8e34
e9597aa2854d128fd968be5edc8a28d9
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv
_version_ 1802117596871393280