Is linguistic information relevant for the classification of legal texts?

Detalhes bibliográficos
Autor(a) principal: Gonçalves, Teresa
Data de Publicação: 2005
Outros Autores: Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/2561
Resumo: Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.
id RCAP_5e2c7b54e058de113f27419aa83b4dfa
oai_identifier_str oai:dspace.uevora.pt:10174/2561
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Is linguistic information relevant for the classification of legal texts?Text classificationText classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.ACM2011-02-15T11:37:47Z2011-02-152005-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article200201 bytesapplication/pdfhttp://hdl.handle.net/10174/2561http://hdl.handle.net/10174/2561eng168-176ISBN 1-59593-081-7ICAIL-05, 10th International Conference on Artificial Intelligence and Lawlivretcg@uevora.ptpq@uevora.ptSartor, G.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2561Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.104965Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Is linguistic information relevant for the classification of legal texts?
title Is linguistic information relevant for the classification of legal texts?
spellingShingle Is linguistic information relevant for the classification of legal texts?
Gonçalves, Teresa
Text classification
title_short Is linguistic information relevant for the classification of legal texts?
title_full Is linguistic information relevant for the classification of legal texts?
title_fullStr Is linguistic information relevant for the classification of legal texts?
title_full_unstemmed Is linguistic information relevant for the classification of legal texts?
title_sort Is linguistic information relevant for the classification of legal texts?
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv Text classification
topic Text classification
description Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.
publishDate 2005
dc.date.none.fl_str_mv 2005-01-01T00:00:00Z
2011-02-15T11:37:47Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2561
http://hdl.handle.net/10174/2561
url http://hdl.handle.net/10174/2561
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 168-176
ISBN 1-59593-081-7
ICAIL-05, 10th International Conference on Artificial Intelligence and Law
livre
tcg@uevora.pt
pq@uevora.pt
Sartor, G.
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 200201 bytes
application/pdf
dc.publisher.none.fl_str_mv ACM
publisher.none.fl_str_mv ACM
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136465768153088