Efficient clustering of web-derived data sets

Luís Sarmento; Alexander Kehlenbeck; Eugénio Oliveira; Lyle Ungar

Efficient clustering of web-derived data sets

Detalhes bibliográficos
Autor(a) principal:	Luís Sarmento
Data de Publicação:	2009
Outros Autores:	Alexander Kehlenbeck, Eugénio Oliveira, Lyle Ungar
Tipo de documento:	Livro
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://repositorio-aberto.up.pt/handle/10216/15175
Resumo:	Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.

Metadados do item

id	RCAP_66ff36c6a85d1195518405472810766a
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/15175
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Efficient clustering of web-derived data setsInformática, Ciências da computação e da informaçãoInformatics, Computer and information sciencesMany data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.20092009-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/15175eng10.1007/978-3-642-03070-3_30Luís SarmentoAlexander KehlenbeckEugénio OliveiraLyle Ungarinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:57:00Zoai:repositorio-aberto.up.pt:10216/15175Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:30:13.968744Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Efficient clustering of web-derived data sets
title	Efficient clustering of web-derived data sets
spellingShingle	Efficient clustering of web-derived data sets Luís Sarmento Informática, Ciências da computação e da informação Informatics, Computer and information sciences
title_short	Efficient clustering of web-derived data sets
title_full	Efficient clustering of web-derived data sets
title_fullStr	Efficient clustering of web-derived data sets
title_full_unstemmed	Efficient clustering of web-derived data sets
title_sort	Efficient clustering of web-derived data sets
author	Luís Sarmento
author_facet	Luís Sarmento Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar
author_role	author
author2	Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar
author2_role	author author author
dc.contributor.author.fl_str_mv	Luís Sarmento Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar
dc.subject.por.fl_str_mv	Informática, Ciências da computação e da informação Informatics, Computer and information sciences
topic	Informática, Ciências da computação e da informação Informatics, Computer and information sciences
description	Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.
publishDate	2009
dc.date.none.fl_str_mv	2009 2009-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/book
format	book
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://repositorio-aberto.up.pt/handle/10216/15175
url	https://repositorio-aberto.up.pt/handle/10216/15175
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.1007/978-3-642-03070-3_30
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135610071416832

Efficient clustering of web-derived data sets

Registros relacionados