Efficient clustering of web-derived data sets

Detalhes bibliográficos
Autor(a) principal: Luís Sarmento
Data de Publicação: 2009
Outros Autores: Alexander Kehlenbeck, Eugénio Oliveira, Lyle Ungar
Tipo de documento: Livro
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://repositorio-aberto.up.pt/handle/10216/15175
Resumo: Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.
id RCAP_66ff36c6a85d1195518405472810766a
oai_identifier_str oai:repositorio-aberto.up.pt:10216/15175
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Efficient clustering of web-derived data setsInformática, Ciências da computação e da informaçãoInformatics, Computer and information sciencesMany data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.20092009-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/15175eng10.1007/978-3-642-03070-3_30Luís SarmentoAlexander KehlenbeckEugénio OliveiraLyle Ungarinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:57:00Zoai:repositorio-aberto.up.pt:10216/15175Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:30:13.968744Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Efficient clustering of web-derived data sets
title Efficient clustering of web-derived data sets
spellingShingle Efficient clustering of web-derived data sets
Luís Sarmento
Informática, Ciências da computação e da informação
Informatics, Computer and information sciences
title_short Efficient clustering of web-derived data sets
title_full Efficient clustering of web-derived data sets
title_fullStr Efficient clustering of web-derived data sets
title_full_unstemmed Efficient clustering of web-derived data sets
title_sort Efficient clustering of web-derived data sets
author Luís Sarmento
author_facet Luís Sarmento
Alexander Kehlenbeck
Eugénio Oliveira
Lyle Ungar
author_role author
author2 Alexander Kehlenbeck
Eugénio Oliveira
Lyle Ungar
author2_role author
author
author
dc.contributor.author.fl_str_mv Luís Sarmento
Alexander Kehlenbeck
Eugénio Oliveira
Lyle Ungar
dc.subject.por.fl_str_mv Informática, Ciências da computação e da informação
Informatics, Computer and information sciences
topic Informática, Ciências da computação e da informação
Informatics, Computer and information sciences
description Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.
publishDate 2009
dc.date.none.fl_str_mv 2009
2009-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/book
format book
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio-aberto.up.pt/handle/10216/15175
url https://repositorio-aberto.up.pt/handle/10216/15175
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/978-3-642-03070-3_30
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135610071416832