TweeProfiles4: a weighted multidimensional stream clustering algorithm
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://repositorio-aberto.up.pt/handle/10216/83533 |
Resumo: | The emergence of social media made it possible for users to easily share their thoughts on different topics, which constitutes a rich source of information for many fields. Microblogging platforms experienced a large and steady growth over the last few years. Twitter is the most popular microblogging site, making it an interesting source of data for pattern extraction. One of the main challenges of analyzing social media data is its continuous nature, which makes it hard to use traditional data mining. Therefore, mining stream data has also received a lot of attention recently.TweeProfiles is a data mining tool for analyzing and visualizing Twitter data over four dimensions: spatial (the location of the tweet), temporal (the timestamp of the tweet), content (the text of the tweet) and social (relationship graph). This is an ongoing project which still has many aspects that can be improved. For instance, it was recently improved by replacing the original clustering algorithm which could not handle the continuous flow of data with a streaming method. The goal of this dissertation is to continue the development of TweeProfiles. First, the stream clustering process will be improved by proposing a new algorithm. This will be achieved by developing an incremental algorithm with support for multi-dimensional streaming data. Moreover, it should make it possible for the user to dynamically change the relative importance of each dimension in the clustering. Additionally, the empirical evaluation of the results will also be improved.Suitable measures to evaluate the extracted patterns will be identified and implemented. An empirical study will be done using data consisting of georeferenced tweets from SocialBus. |
id |
RCAP_a6fc7d6a7616fe9771915372a39ae5c7 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/83533 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
TweeProfiles4: a weighted multidimensional stream clustering algorithmEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringThe emergence of social media made it possible for users to easily share their thoughts on different topics, which constitutes a rich source of information for many fields. Microblogging platforms experienced a large and steady growth over the last few years. Twitter is the most popular microblogging site, making it an interesting source of data for pattern extraction. One of the main challenges of analyzing social media data is its continuous nature, which makes it hard to use traditional data mining. Therefore, mining stream data has also received a lot of attention recently.TweeProfiles is a data mining tool for analyzing and visualizing Twitter data over four dimensions: spatial (the location of the tweet), temporal (the timestamp of the tweet), content (the text of the tweet) and social (relationship graph). This is an ongoing project which still has many aspects that can be improved. For instance, it was recently improved by replacing the original clustering algorithm which could not handle the continuous flow of data with a streaming method. The goal of this dissertation is to continue the development of TweeProfiles. First, the stream clustering process will be improved by proposing a new algorithm. This will be achieved by developing an incremental algorithm with support for multi-dimensional streaming data. Moreover, it should make it possible for the user to dynamically change the relative importance of each dimension in the clustering. Additionally, the empirical evaluation of the results will also be improved.Suitable measures to evaluate the extracted patterns will be identified and implemented. An empirical study will be done using data consisting of georeferenced tweets from SocialBus.2015-07-212015-07-21T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/83533TID:201311500engLuís Miguel Azevedo Pereirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:54:58Zoai:repositorio-aberto.up.pt:10216/83533Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:11:30.142010Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
title |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
spellingShingle |
TweeProfiles4: a weighted multidimensional stream clustering algorithm Luís Miguel Azevedo Pereira Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
title_full |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
title_fullStr |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
title_full_unstemmed |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
title_sort |
TweeProfiles4: a weighted multidimensional stream clustering algorithm |
author |
Luís Miguel Azevedo Pereira |
author_facet |
Luís Miguel Azevedo Pereira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Luís Miguel Azevedo Pereira |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
The emergence of social media made it possible for users to easily share their thoughts on different topics, which constitutes a rich source of information for many fields. Microblogging platforms experienced a large and steady growth over the last few years. Twitter is the most popular microblogging site, making it an interesting source of data for pattern extraction. One of the main challenges of analyzing social media data is its continuous nature, which makes it hard to use traditional data mining. Therefore, mining stream data has also received a lot of attention recently.TweeProfiles is a data mining tool for analyzing and visualizing Twitter data over four dimensions: spatial (the location of the tweet), temporal (the timestamp of the tweet), content (the text of the tweet) and social (relationship graph). This is an ongoing project which still has many aspects that can be improved. For instance, it was recently improved by replacing the original clustering algorithm which could not handle the continuous flow of data with a streaming method. The goal of this dissertation is to continue the development of TweeProfiles. First, the stream clustering process will be improved by proposing a new algorithm. This will be achieved by developing an incremental algorithm with support for multi-dimensional streaming data. Moreover, it should make it possible for the user to dynamically change the relative importance of each dimension in the clustering. Additionally, the empirical evaluation of the results will also be improved.Suitable measures to evaluate the extracted patterns will be identified and implemented. An empirical study will be done using data consisting of georeferenced tweets from SocialBus. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-07-21 2015-07-21T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/83533 TID:201311500 |
url |
https://repositorio-aberto.up.pt/handle/10216/83533 |
identifier_str_mv |
TID:201311500 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136037864210432 |