Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Teresa
Data de Publicação: 2010
Outros Autores: Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/2556
Resumo: Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.
id RCAP_003f406f2a1753d04b826c5f06eca6e1
oai_identifier_str oai:dspace.uevora.pt:10174/2556
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documentsmachine learningnamed entity recognitionInformation extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.Springer-Verlag2011-02-15T10:47:31Z2011-02-152010-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article260319 bytesapplication/pdfhttp://hdl.handle.net/10174/2556http://hdl.handle.net/10174/2556eng44-59978-3-642-12836-3Lecture Notes in Computer Science6036livretcg@uevora.ptpq@uevora.ptSemantic Processing of Legal Texts498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2556Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.062062Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
spellingShingle Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
Gonçalves, Teresa
machine learning
named entity recognition
title_short Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_full Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_fullStr Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_full_unstemmed Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_sort Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv machine learning
named entity recognition
topic machine learning
named entity recognition
description Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.
publishDate 2010
dc.date.none.fl_str_mv 2010-01-01T00:00:00Z
2011-02-15T10:47:31Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2556
http://hdl.handle.net/10174/2556
url http://hdl.handle.net/10174/2556
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 44-59
978-3-642-12836-3
Lecture Notes in Computer Science
6036
livre
tcg@uevora.pt
pq@uevora.pt
Semantic Processing of Legal Texts
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 260319 bytes
application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136465766055936