Efficient clustering of web-derived data sets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2009 |
Outros Autores: | , , |
Tipo de documento: | Livro |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://repositorio-aberto.up.pt/handle/10216/15175 |
Resumo: | Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data. |
id |
RCAP_66ff36c6a85d1195518405472810766a |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/15175 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Efficient clustering of web-derived data setsInformática, Ciências da computação e da informaçãoInformatics, Computer and information sciencesMany data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data.20092009-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/15175eng10.1007/978-3-642-03070-3_30Luís SarmentoAlexander KehlenbeckEugénio OliveiraLyle Ungarinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:57:00Zoai:repositorio-aberto.up.pt:10216/15175Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:30:13.968744Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Efficient clustering of web-derived data sets |
title |
Efficient clustering of web-derived data sets |
spellingShingle |
Efficient clustering of web-derived data sets Luís Sarmento Informática, Ciências da computação e da informação Informatics, Computer and information sciences |
title_short |
Efficient clustering of web-derived data sets |
title_full |
Efficient clustering of web-derived data sets |
title_fullStr |
Efficient clustering of web-derived data sets |
title_full_unstemmed |
Efficient clustering of web-derived data sets |
title_sort |
Efficient clustering of web-derived data sets |
author |
Luís Sarmento |
author_facet |
Luís Sarmento Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar |
author_role |
author |
author2 |
Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Luís Sarmento Alexander Kehlenbeck Eugénio Oliveira Lyle Ungar |
dc.subject.por.fl_str_mv |
Informática, Ciências da computação e da informação Informatics, Computer and information sciences |
topic |
Informática, Ciências da computação e da informação Informatics, Computer and information sciences |
description |
Many data sets derived from the web are large, high-dimensional, sparse and have a Zipfian distribution of both classes and features. On such data sets, current scalable clustering methods such as streaming clustering suffer from fragmentation. where large classes are incorrectly divided into many smaller clusters. and computational efficiency drops significantly. We present a new clustering algorithm based on connected components that addresses these issues and so works well oil web-type data. |
publishDate |
2009 |
dc.date.none.fl_str_mv |
2009 2009-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/book |
format |
book |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/15175 |
url |
https://repositorio-aberto.up.pt/handle/10216/15175 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1007/978-3-642-03070-3_30 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135610071416832 |