Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico

Detalhes bibliográficos
Autor(a) principal: Castro, Pedro Vitor Quinta de
Data de Publicação: 2019
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFG
Texto Completo: http://repositorio.bc.ufg.br/tede/handle/tede/10276
Resumo: Named Entity Recognition (NER) is a challenging Natural Language Processing task for a language as rich as Portuguese. When applied to a specific domain, the task acquires a new layer of complexity, handling a lexicon particular to the domain in question. In this work, it is studied the Legal domain, targeting specifically the Brazilian Labor Law. Architectures based on Deep Learning, with word representations based on static word embeddings and language models have shown state-of-the-art performance for the NER task. In this work it is used a model based on Deep Neural Networks, evaluating different forms of word representations. The evaluated models are applied to Portuguese language, for both Legal and general domains. To this end, language models based on the ELMo architecture were trained for both domains, as well as static word embeddings, specific for the Legal domain. In this work, it is verified the best type of pre-trained word embeddings for each domain, after performing a comparative study between the types of word embeddings applied to the NER task. For the training of the Legal domain NER models, ELMo and static word embeddings, two different corpora were produced and annotated, based on a collection of public documents from the Brazilian Labor Court. For the Portuguese general domain NER model, a new state-of-the-art result was achieved for the HAREM benchmark, with 83.22% F-Score for the selective scenario, and 78.04% for the total scenario. For the Brazilian Labor Law domain, a model with 93.81% F-Score was obtained.
id UFG-2_b4a01062752a4f83f45fb13f3d67e22c
oai_identifier_str oai:repositorio.bc.ufg.br:tede/10276
network_acronym_str UFG-2
network_name_str Repositório Institucional da UFG
repository_id_str
spelling Silva, Nádia Félix Felipe dahttp://lattes.cnpq.br/7864834001694765Soares, Anderson da Silvahttp://lattes.cnpq.br/1096941114079527Silva, Nadia Felix Felipe daRosa, Thierson CoutoSoares, Anderson da SilvaCaseli, Helena de Medeiroshttp://lattes.cnpq.br/1573165588536766Castro, Pedro Vitor Quinta de2020-01-07T11:57:54Z2019-12-05CASTRO, P. V. Q. Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico. 2019. 125 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2019.http://repositorio.bc.ufg.br/tede/handle/tede/10276Named Entity Recognition (NER) is a challenging Natural Language Processing task for a language as rich as Portuguese. When applied to a specific domain, the task acquires a new layer of complexity, handling a lexicon particular to the domain in question. In this work, it is studied the Legal domain, targeting specifically the Brazilian Labor Law. Architectures based on Deep Learning, with word representations based on static word embeddings and language models have shown state-of-the-art performance for the NER task. In this work it is used a model based on Deep Neural Networks, evaluating different forms of word representations. The evaluated models are applied to Portuguese language, for both Legal and general domains. To this end, language models based on the ELMo architecture were trained for both domains, as well as static word embeddings, specific for the Legal domain. In this work, it is verified the best type of pre-trained word embeddings for each domain, after performing a comparative study between the types of word embeddings applied to the NER task. For the training of the Legal domain NER models, ELMo and static word embeddings, two different corpora were produced and annotated, based on a collection of public documents from the Brazilian Labor Court. For the Portuguese general domain NER model, a new state-of-the-art result was achieved for the HAREM benchmark, with 83.22% F-Score for the selective scenario, and 78.04% for the total scenario. For the Brazilian Labor Law domain, a model with 93.81% F-Score was obtained.Reconhecimento de Entidades Nomeadas (REN) é uma tarefa desafiadora em Processamento de Linguagem Natural, para uma língua tão rica quanto o Português. Quando aplicada em um domínio específico, a tarefa adquire uma nova camada de complexidade, por tratar de um léxico muito particular ao domínio trabalhado. O domínio estudado neste trabalho é o do Direito, voltado especificamente para a Justiça do Trabalho do Brasil. Arquiteturas baseadas em Aprendizado Profundo, com representações de palavras baseadas em vetores estáticos de palavras e modelos de linguagem, têm demonstrado um desempenho em nível de estado da arte para a tarefa de REN. Neste trabalho é utilizado um modelo baseado em Redes Neurais Profundas, avaliando diferentes formas de representação de palavras. São avaliados modelos tanto para o domínio do Direito quanto para a língua portuguesa em um contexto geral. Para tanto, foram treinados modelos de linguagem baseados na arquitetura ELMo para os dois domínios, assim como vetores estáticos de palavras específicos para o domínio do Direito. Neste trabalho também verificou-se os melhores tipos de vetores para cada domínio, a partir de uma série de análises comparativas entre os vetores aplicados na tarefa de REN. Para os treinos dos modelos de REN, ELMo e vetores estáticos do domínio jurídico foram produzidos e anotados em corpora específicos deste domínio, a partir da coleta de documentos públicos da Justiça do Trabalho do Brasil. Para o modelo de REN do domínio geral da língua portuguesa, atingiu-se um novo estado da arte no benchmark do HAREM, com 83.22% de F-Score para o cenário seletivo, e 78.04% para o cenário total. Para o domínio trabalhista brasileiro, foi obtido um modelo com 93.81% de F-Score.Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2020-01-06T14:08:58Z No. of bitstreams: 2 Dissertação - Pedro Vitor Quinta de Castro - 2019.pdf: 1941412 bytes, checksum: c5467726f2cd684553e007670b8443ec (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2020-01-07T11:57:54Z (GMT) No. of bitstreams: 2 Dissertação - Pedro Vitor Quinta de Castro - 2019.pdf: 1941412 bytes, checksum: c5467726f2cd684553e007670b8443ec (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)Made available in DSpace on 2020-01-07T11:57:54Z (GMT). No. of bitstreams: 2 Dissertação - Pedro Vitor Quinta de Castro - 2019.pdf: 1941412 bytes, checksum: c5467726f2cd684553e007670b8443ec (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2019-12-05application/pdfporUniversidade Federal de GoiásPrograma de Pós-graduação em Ciência da Computação (INF)UFGBrasilInstituto de Informática - INF (RG)http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessReconhecimento de entidades nomeadasProcessamento de linguagem naturalDeep learningRedes neuraisLíngua portuguesaDireito do trabalhoNamed entity recognitionNatural language processingDeep learningNeural networksPortuguese languageLabor lawCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídicoDeep learning for named entity recognition in legal domaininfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-3303550325223384799600600600-77122667346336447683671711205811204509reponame:Repositório Institucional da UFGinstname:Universidade Federal de Goiás (UFG)instacron:UFGLICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://repositorio.bc.ufg.br/tede/bitstreams/e806d6c4-db2c-4902-8b96-9ab39cba3a78/downloadbd3efa91386c1718a7f26a329fdcb468MD51CC-LICENSElicense_urllicense_urltext/plain; charset=utf-849http://repositorio.bc.ufg.br/tede/bitstreams/46d21799-f4bd-42cb-9fce-b3c76bf39e36/download4afdbb8c545fd630ea7db775da747b2fMD52license_textlicense_texttext/html; charset=utf-80http://repositorio.bc.ufg.br/tede/bitstreams/89768d0d-f90d-44f1-9e47-91a857ff8619/downloadd41d8cd98f00b204e9800998ecf8427eMD53license_rdflicense_rdfapplication/rdf+xml; charset=utf-80http://repositorio.bc.ufg.br/tede/bitstreams/656c4be1-61a5-47dc-8b9a-1abcb3c6d31c/downloadd41d8cd98f00b204e9800998ecf8427eMD54ORIGINALDissertação - Pedro Vitor Quinta de Castro - 2019.pdfDissertação - Pedro Vitor Quinta de Castro - 2019.pdfapplication/pdf1941412http://repositorio.bc.ufg.br/tede/bitstreams/e6c3772f-3f25-4ee0-9e9d-2bf5cf79126e/downloadc5467726f2cd684553e007670b8443ecMD55tede/102762020-01-07 08:57:54.537http://creativecommons.org/licenses/by-nc-nd/4.0/Acesso Abertoopen.accessoai:repositorio.bc.ufg.br:tede/10276http://repositorio.bc.ufg.br/tedeRepositório InstitucionalPUBhttp://repositorio.bc.ufg.br/oai/requesttasesdissertacoes.bc@ufg.bropendoar:2020-01-07T11:57:54Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)falseTk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=
dc.title.eng.fl_str_mv Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
dc.title.alternative.eng.fl_str_mv Deep learning for named entity recognition in legal domain
title Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
spellingShingle Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
Castro, Pedro Vitor Quinta de
Reconhecimento de entidades nomeadas
Processamento de linguagem natural
Deep learning
Redes neurais
Língua portuguesa
Direito do trabalho
Named entity recognition
Natural language processing
Deep learning
Neural networks
Portuguese language
Labor law
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
title_full Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
title_fullStr Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
title_full_unstemmed Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
title_sort Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico
author Castro, Pedro Vitor Quinta de
author_facet Castro, Pedro Vitor Quinta de
author_role author
dc.contributor.advisor1.fl_str_mv Silva, Nádia Félix Felipe da
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/7864834001694765
dc.contributor.advisor-co1.fl_str_mv Soares, Anderson da Silva
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/1096941114079527
dc.contributor.referee1.fl_str_mv Silva, Nadia Felix Felipe da
dc.contributor.referee2.fl_str_mv Rosa, Thierson Couto
dc.contributor.referee3.fl_str_mv Soares, Anderson da Silva
dc.contributor.referee4.fl_str_mv Caseli, Helena de Medeiros
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/1573165588536766
dc.contributor.author.fl_str_mv Castro, Pedro Vitor Quinta de
contributor_str_mv Silva, Nádia Félix Felipe da
Soares, Anderson da Silva
Silva, Nadia Felix Felipe da
Rosa, Thierson Couto
Soares, Anderson da Silva
Caseli, Helena de Medeiros
dc.subject.por.fl_str_mv Reconhecimento de entidades nomeadas
Processamento de linguagem natural
Deep learning
Redes neurais
Língua portuguesa
Direito do trabalho
topic Reconhecimento de entidades nomeadas
Processamento de linguagem natural
Deep learning
Redes neurais
Língua portuguesa
Direito do trabalho
Named entity recognition
Natural language processing
Deep learning
Neural networks
Portuguese language
Labor law
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Named entity recognition
Natural language processing
Deep learning
Neural networks
Portuguese language
Labor law
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Named Entity Recognition (NER) is a challenging Natural Language Processing task for a language as rich as Portuguese. When applied to a specific domain, the task acquires a new layer of complexity, handling a lexicon particular to the domain in question. In this work, it is studied the Legal domain, targeting specifically the Brazilian Labor Law. Architectures based on Deep Learning, with word representations based on static word embeddings and language models have shown state-of-the-art performance for the NER task. In this work it is used a model based on Deep Neural Networks, evaluating different forms of word representations. The evaluated models are applied to Portuguese language, for both Legal and general domains. To this end, language models based on the ELMo architecture were trained for both domains, as well as static word embeddings, specific for the Legal domain. In this work, it is verified the best type of pre-trained word embeddings for each domain, after performing a comparative study between the types of word embeddings applied to the NER task. For the training of the Legal domain NER models, ELMo and static word embeddings, two different corpora were produced and annotated, based on a collection of public documents from the Brazilian Labor Court. For the Portuguese general domain NER model, a new state-of-the-art result was achieved for the HAREM benchmark, with 83.22% F-Score for the selective scenario, and 78.04% for the total scenario. For the Brazilian Labor Law domain, a model with 93.81% F-Score was obtained.
publishDate 2019
dc.date.issued.fl_str_mv 2019-12-05
dc.date.accessioned.fl_str_mv 2020-01-07T11:57:54Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv CASTRO, P. V. Q. Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico. 2019. 125 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2019.
dc.identifier.uri.fl_str_mv http://repositorio.bc.ufg.br/tede/handle/tede/10276
identifier_str_mv CASTRO, P. V. Q. Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico. 2019. 125 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2019.
url http://repositorio.bc.ufg.br/tede/handle/tede/10276
dc.language.iso.fl_str_mv por
language por
dc.relation.program.fl_str_mv -3303550325223384799
dc.relation.confidence.fl_str_mv 600
600
600
dc.relation.department.fl_str_mv -7712266734633644768
dc.relation.cnpq.fl_str_mv 3671711205811204509
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Goiás
dc.publisher.program.fl_str_mv Programa de Pós-graduação em Ciência da Computação (INF)
dc.publisher.initials.fl_str_mv UFG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Instituto de Informática - INF (RG)
publisher.none.fl_str_mv Universidade Federal de Goiás
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFG
instname:Universidade Federal de Goiás (UFG)
instacron:UFG
instname_str Universidade Federal de Goiás (UFG)
instacron_str UFG
institution UFG
reponame_str Repositório Institucional da UFG
collection Repositório Institucional da UFG
bitstream.url.fl_str_mv http://repositorio.bc.ufg.br/tede/bitstreams/e806d6c4-db2c-4902-8b96-9ab39cba3a78/download
http://repositorio.bc.ufg.br/tede/bitstreams/46d21799-f4bd-42cb-9fce-b3c76bf39e36/download
http://repositorio.bc.ufg.br/tede/bitstreams/89768d0d-f90d-44f1-9e47-91a857ff8619/download
http://repositorio.bc.ufg.br/tede/bitstreams/656c4be1-61a5-47dc-8b9a-1abcb3c6d31c/download
http://repositorio.bc.ufg.br/tede/bitstreams/e6c3772f-3f25-4ee0-9e9d-2bf5cf79126e/download
bitstream.checksum.fl_str_mv bd3efa91386c1718a7f26a329fdcb468
4afdbb8c545fd630ea7db775da747b2f
d41d8cd98f00b204e9800998ecf8427e
d41d8cd98f00b204e9800998ecf8427e
c5467726f2cd684553e007670b8443ec
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)
repository.mail.fl_str_mv tasesdissertacoes.bc@ufg.br
_version_ 1798044378685505536