Using feature engineering to improve the identification of bias in legal judgment database
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRN |
Texto Completo: | https://repositorio.ufrn.br/handle/123456789/48478 |
Resumo: | Machine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base. |
id |
UFRN_b2e5343b27fce8133a0940900a9c7b15 |
---|---|
oai_identifier_str |
oai:https://repositorio.ufrn.br:123456789/48478 |
network_acronym_str |
UFRN |
network_name_str |
Repositório Institucional da UFRN |
repository_id_str |
|
spelling |
Carvalho, Daniel Marx Pintohttp://lattes.cnpq.br/36256513945046320000-0001-7461-7570http://lattes.cnpq.br/2234040548103596Pereira, Mônica Magalhãeshttp://lattes.cnpq.br/5777010848661813Costa, José Alfredo Ferreirahttp://lattes.cnpq.br/9745845064013172Abreu, Marjory Cristiany Da Costa2022-07-13T09:56:30Z2022-07-13T09:56:30Z2022-07-11CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022.https://repositorio.ufrn.br/handle/123456789/48478Universidade Federal do Rio Grande do NorteCiência da ComputaçãoUFRNBrasilInformática e Matemática AplicadaAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessMachine LearningFeature SelectionUndersamplingPrediction of bias in court decisionsUsing feature engineering to improve the identification of bias in legal judgment databaseinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisMachine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNORIGINALUsingFeatureEngineering_Carvalho_2022.pdfUsingFeatureEngineering_Carvalho_2022.pdfapplication/pdf574888https://repositorio.ufrn.br/bitstream/123456789/48478/1/UsingFeatureEngineering_Carvalho_2022.pdfbda01f57dae6032ac75faef56ce7412eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufrn.br/bitstream/123456789/48478/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/48478/3/license.txte9597aa2854d128fd968be5edc8a28d9MD53123456789/484782022-08-19 09:12:25.256oai:https://repositorio.ufrn.br:123456789/48478Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2022-08-19T12:12:25Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
dc.title.pt_BR.fl_str_mv |
Using feature engineering to improve the identification of bias in legal judgment database |
title |
Using feature engineering to improve the identification of bias in legal judgment database |
spellingShingle |
Using feature engineering to improve the identification of bias in legal judgment database Carvalho, Daniel Marx Pinto Machine Learning Feature Selection Undersampling Prediction of bias in court decisions |
title_short |
Using feature engineering to improve the identification of bias in legal judgment database |
title_full |
Using feature engineering to improve the identification of bias in legal judgment database |
title_fullStr |
Using feature engineering to improve the identification of bias in legal judgment database |
title_full_unstemmed |
Using feature engineering to improve the identification of bias in legal judgment database |
title_sort |
Using feature engineering to improve the identification of bias in legal judgment database |
author |
Carvalho, Daniel Marx Pinto |
author_facet |
Carvalho, Daniel Marx Pinto |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/3625651394504632 |
dc.contributor.advisorID.pt_BR.fl_str_mv |
0000-0001-7461-7570 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/2234040548103596 |
dc.contributor.referees1.none.fl_str_mv |
Costa, José Alfredo Ferreira |
dc.contributor.referees1Lattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/9745845064013172 |
dc.contributor.author.fl_str_mv |
Carvalho, Daniel Marx Pinto |
dc.contributor.advisor-co1.fl_str_mv |
Pereira, Mônica Magalhães |
dc.contributor.advisor-co1Lattes.fl_str_mv |
http://lattes.cnpq.br/5777010848661813 |
dc.contributor.advisor1.fl_str_mv |
Abreu, Marjory Cristiany Da Costa |
contributor_str_mv |
Pereira, Mônica Magalhães Abreu, Marjory Cristiany Da Costa |
dc.subject.por.fl_str_mv |
Machine Learning Feature Selection Undersampling Prediction of bias in court decisions |
topic |
Machine Learning Feature Selection Undersampling Prediction of bias in court decisions |
description |
Machine learning techniques have been widely applied to judicial data. The use of thesetechniques allows the extraction of patterns that exist among the data, which can beused, for example, to predict the outcome of court decisions. In Brazil, the use of machine learning in the legal area is already a reality, having been enabled by the work of digitizing legal processes, which has been taking place since the beginning of the century. In general, judicial databases are characterized by high dimension. If the training dataset has irrelevant or redundant attributes, or even a large variation in the sample size of the classes you want to classify (that is, an unbalanced database), the results of the classification analysis step may produce less accurate results. In addition to that, a large number of attributes increases the computational power required for the task and the complexity of the classification models used. In these circumstances, the data pre-processing step is essential to prepare the database for analysis and increase the accuracy of predictions. Judicial data has yet another important characteristic, its collection, and identification of features often require specialized professionals in the legal area and a manual process of building the database, which makes it slow, not to mention that there may be incomplete data on a particular feature. The objective of the work in question is, therefore, to make use feature engineering techniques, such as selection and undersampling, on a dataset of judicial sentences (built by the research group ’Além da Pena’), in order to investigate the detection of gender biases in the judicial decisions present in the base. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-07-13T09:56:30Z |
dc.date.available.fl_str_mv |
2022-07-13T09:56:30Z |
dc.date.issued.fl_str_mv |
2022-07-11 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufrn.br/handle/123456789/48478 |
identifier_str_mv |
CARVALHO, Daniel Marx Pinto. Using feature engineering to improve the identification of bias in legal judgment database. 2022. 38f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Departamento de Informática e Matemática Aplicada, Universidade Federal do Rio Grande do Norte, Natal, 2022. |
url |
https://repositorio.ufrn.br/handle/123456789/48478 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte |
dc.publisher.program.fl_str_mv |
Ciência da Computação |
dc.publisher.initials.fl_str_mv |
UFRN |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Informática e Matemática Aplicada |
publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
instacron_str |
UFRN |
institution |
UFRN |
reponame_str |
Repositório Institucional da UFRN |
collection |
Repositório Institucional da UFRN |
bitstream.url.fl_str_mv |
https://repositorio.ufrn.br/bitstream/123456789/48478/1/UsingFeatureEngineering_Carvalho_2022.pdf https://repositorio.ufrn.br/bitstream/123456789/48478/2/license_rdf https://repositorio.ufrn.br/bitstream/123456789/48478/3/license.txt |
bitstream.checksum.fl_str_mv |
bda01f57dae6032ac75faef56ce7412e e39d27027a6cc9cb039ad269a5db8e34 e9597aa2854d128fd968be5edc8a28d9 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
repository.mail.fl_str_mv |
|
_version_ |
1802117596871393280 |