Is linguistic information relevant for the classification of legal texts?
Autor(a) principal: | |
---|---|
Data de Publicação: | 2005 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/2561 |
Resumo: | Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm. |
id |
RCAP_5e2c7b54e058de113f27419aa83b4dfa |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/2561 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Is linguistic information relevant for the classification of legal texts?Text classificationText classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.ACM2011-02-15T11:37:47Z2011-02-152005-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article200201 bytesapplication/pdfhttp://hdl.handle.net/10174/2561http://hdl.handle.net/10174/2561eng168-176ISBN 1-59593-081-7ICAIL-05, 10th International Conference on Artificial Intelligence and Lawlivretcg@uevora.ptpq@uevora.ptSartor, G.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2561Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.104965Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Is linguistic information relevant for the classification of legal texts? |
title |
Is linguistic information relevant for the classification of legal texts? |
spellingShingle |
Is linguistic information relevant for the classification of legal texts? Gonçalves, Teresa Text classification |
title_short |
Is linguistic information relevant for the classification of legal texts? |
title_full |
Is linguistic information relevant for the classification of legal texts? |
title_fullStr |
Is linguistic information relevant for the classification of legal texts? |
title_full_unstemmed |
Is linguistic information relevant for the classification of legal texts? |
title_sort |
Is linguistic information relevant for the classification of legal texts? |
author |
Gonçalves, Teresa |
author_facet |
Gonçalves, Teresa Quaresma, Paulo |
author_role |
author |
author2 |
Quaresma, Paulo |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Gonçalves, Teresa Quaresma, Paulo |
dc.subject.por.fl_str_mv |
Text classification |
topic |
Text classification |
description |
Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm. |
publishDate |
2005 |
dc.date.none.fl_str_mv |
2005-01-01T00:00:00Z 2011-02-15T11:37:47Z 2011-02-15 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/2561 http://hdl.handle.net/10174/2561 |
url |
http://hdl.handle.net/10174/2561 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
168-176 ISBN 1-59593-081-7 ICAIL-05, 10th International Conference on Artificial Intelligence and Law livre tcg@uevora.pt pq@uevora.pt Sartor, G. 498 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
200201 bytes application/pdf |
dc.publisher.none.fl_str_mv |
ACM |
publisher.none.fl_str_mv |
ACM |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136465768153088 |