Social monitoring for risk prediction in public forums
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/34066 |
Resumo: | In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset. |
id |
RCAP_ea3281de94450ae1496bad87ee47e7e5 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/34066 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Social monitoring for risk prediction in public forumsSocial mediaHyperparametersSelf-harmMachine learningIn the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.Nos últimos anos, as redes sociais têm mudado os métodos de comunicação com que as pessoas se exprimem. Uma grande diferença entre este novo mecanismo de comunicação e os métodos tradicionais é que estes novos dados são documentados. Por conseguinte, é importante entender como todos esses dados podem ser usados para mudar de forma positiva as vidas das pessoas. O tema principal desta tese é usar esses dados para perceber se um usuário apresenta algum sinal de problemas de saúde mental. Participámos também, com o nome do grupo Bioinformatics do Instituto de Engenharia Electrónica e Informática da Universidade de Aveiro (BioInfo @ UAVR), na 2ª tarefa partilhada do CLEF eRisk 2021. O eRisk é um desafio online “Early Risk Prediction on the Internet”, cujas tarefas consistem em analisar dados de media e promover pesquisas sobre deteção precoce de doenças mentais. Este ano, o eRisk teve 3 tarefas, cada uma focada num problema de saúde diferente. Este trabalho foca-se em solucionar a 2ª tarefa, cujo objetivo principal é a deteção precoce de utilizadores em risco de automutilação, com base no seu histórico. Este problema foi abordado desenvolvendo modelos de aprendizagem automática supervisionados que podem classificar esses usuários. Nesta abordagem, usamos algoritmos de tokenização baseados em expressões regulares e Yake (que é uma ferramenta de extração de palavras-chave), recursos linguísticos como emojis e outros modelos como BERT e VADER e classificadores como Support Vector Machines ou algoritmos Boosting. Depois de testar todas as combinações de métodos, com diferentes combinações de hiperparâmetros, concluímos que é possível tomar decisões sobre o estado de saúde mental de um usuário com base nesses métodos com latency − weighted igual a 0,46 para o dataset de teste providenciado pelo eRisk 2021.2022-06-27T14:48:46Z2021-12-13T00:00:00Z2021-12-13info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/34066engBarros, Lucas Filipe Roberto deinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:05:39Zoai:ria.ua.pt:10773/34066Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:05:25.743168Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Social monitoring for risk prediction in public forums |
title |
Social monitoring for risk prediction in public forums |
spellingShingle |
Social monitoring for risk prediction in public forums Barros, Lucas Filipe Roberto de Social media Hyperparameters Self-harm Machine learning |
title_short |
Social monitoring for risk prediction in public forums |
title_full |
Social monitoring for risk prediction in public forums |
title_fullStr |
Social monitoring for risk prediction in public forums |
title_full_unstemmed |
Social monitoring for risk prediction in public forums |
title_sort |
Social monitoring for risk prediction in public forums |
author |
Barros, Lucas Filipe Roberto de |
author_facet |
Barros, Lucas Filipe Roberto de |
author_role |
author |
dc.contributor.author.fl_str_mv |
Barros, Lucas Filipe Roberto de |
dc.subject.por.fl_str_mv |
Social media Hyperparameters Self-harm Machine learning |
topic |
Social media Hyperparameters Self-harm Machine learning |
description |
In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-12-13T00:00:00Z 2021-12-13 2022-06-27T14:48:46Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/34066 |
url |
http://hdl.handle.net/10773/34066 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137709451640832 |