Using IR techniques to improve Automated Text Classification

Gonçalves, Teresa; Quaresma, Paulo

Using IR techniques to improve Automated Text Classification

Detalhes bibliográficos
Autor(a) principal:	Gonçalves, Teresa
Data de Publicação:	2004
Outros Autores:	Quaresma, Paulo
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10174/2557
Resumo:	This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).

Metadados do item

id	RCAP_83b1f3f549f979cddd8fda5362525d7f
oai_identifier_str	oai:dspace.uevora.pt:10174/2557
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Using IR techniques to improve Automated Text Classificationmachine learningText classificationThis paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).Springer-Verlag2011-02-15T10:54:06Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article129335 bytesapplication/pdfhttp://hdl.handle.net/10174/2557http://hdl.handle.net/10174/2557eng374-379Lecture Notes in Computer Science3136livretcg@uevora.ptpq@uevora.ptNLDB-04, Natural Language Processing and Information SystemsMeziane, F.Metais, E.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2557Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:58:14.192931Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Using IR techniques to improve Automated Text Classification
title	Using IR techniques to improve Automated Text Classification
spellingShingle	Using IR techniques to improve Automated Text Classification Gonçalves, Teresa machine learning Text classification
title_short	Using IR techniques to improve Automated Text Classification
title_full	Using IR techniques to improve Automated Text Classification
title_fullStr	Using IR techniques to improve Automated Text Classification
title_full_unstemmed	Using IR techniques to improve Automated Text Classification
title_sort	Using IR techniques to improve Automated Text Classification
author	Gonçalves, Teresa
author_facet	Gonçalves, Teresa Quaresma, Paulo
author_role	author
author2	Quaresma, Paulo
author2_role	author
dc.contributor.author.fl_str_mv	Gonçalves, Teresa Quaresma, Paulo
dc.subject.por.fl_str_mv	machine learning Text classification
topic	machine learning Text classification
description	This paper performs a study on the pre-processing phase of the automated text classification problem. We use the linear Support Vector Machine paradigm applied to datasets written in the English and the European Portuguese languages – the Reuters and the Portuguese Attorney General’s Office datasets, respectively. The study can be seen as a search, for the best document representa- tion, in three different axes: the feature reduction (using linguistic in- formation), the feature selection (using word frequencies) and the term weighting (using information retrieval measures).
publishDate	2004
dc.date.none.fl_str_mv	2004-01-01T00:00:00Z 2011-02-15T10:54:06Z 2011-02-15
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10174/2557 http://hdl.handle.net/10174/2557
url	http://hdl.handle.net/10174/2557
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	374-379 Lecture Notes in Computer Science 3136 livre tcg@uevora.pt pq@uevora.pt NLDB-04, Natural Language Processing and Information Systems Meziane, F. Metais, E. 498
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	129335 bytes application/pdf
dc.publisher.none.fl_str_mv	Springer-Verlag
publisher.none.fl_str_mv	Springer-Verlag
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136465770250240

Using IR techniques to improve Automated Text Classification

Registros relacionados