Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Gonçalves, Teresa; Quaresma, Paulo

Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Detalhes bibliográficos
Autor(a) principal:	Gonçalves, Teresa
Data de Publicação:	2010
Outros Autores:	Quaresma, Paulo
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10174/2556
Resumo:	Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.

Metadados do item

id	RCAP_003f406f2a1753d04b826c5f06eca6e1
oai_identifier_str	oai:dspace.uevora.pt:10174/2556
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documentsmachine learningnamed entity recognitionInformation extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.Springer-Verlag2011-02-15T10:47:31Z2011-02-152010-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article260319 bytesapplication/pdfhttp://hdl.handle.net/10174/2556http://hdl.handle.net/10174/2556eng44-59978-3-642-12836-3Lecture Notes in Computer Science6036livretcg@uevora.ptpq@uevora.ptSemantic Processing of Legal Texts498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2556Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.062062Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
spellingShingle	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents Gonçalves, Teresa machine learning named entity recognition
title_short	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_full	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_fullStr	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_full_unstemmed	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
title_sort	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
author	Gonçalves, Teresa
author_facet	Gonçalves, Teresa Quaresma, Paulo
author_role	author
author2	Quaresma, Paulo
author2_role	author
dc.contributor.author.fl_str_mv	Gonçalves, Teresa Quaresma, Paulo
dc.subject.por.fl_str_mv	machine learning named entity recognition
topic	machine learning named entity recognition
description	Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.
publishDate	2010
dc.date.none.fl_str_mv	2010-01-01T00:00:00Z 2011-02-15T10:47:31Z 2011-02-15
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10174/2556 http://hdl.handle.net/10174/2556
url	http://hdl.handle.net/10174/2556
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	44-59 978-3-642-12836-3 Lecture Notes in Computer Science 6036 livre tcg@uevora.pt pq@uevora.pt Semantic Processing of Legal Texts 498
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	260319 bytes application/pdf
dc.publisher.none.fl_str_mv	Springer-Verlag
publisher.none.fl_str_mv	Springer-Verlag
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136465766055936

Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Registros relacionados