Social monitoring for risk prediction in public forums

Detalhes bibliográficos
Autor(a) principal: Barros, Lucas Filipe Roberto de
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/34066
Resumo: In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.
id RCAP_ea3281de94450ae1496bad87ee47e7e5
oai_identifier_str oai:ria.ua.pt:10773/34066
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Social monitoring for risk prediction in public forumsSocial mediaHyperparametersSelf-harmMachine learningIn the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.Nos últimos anos, as redes sociais têm mudado os métodos de comunicação com que as pessoas se exprimem. Uma grande diferença entre este novo mecanismo de comunicação e os métodos tradicionais é que estes novos dados são documentados. Por conseguinte, é importante entender como todos esses dados podem ser usados para mudar de forma positiva as vidas das pessoas. O tema principal desta tese é usar esses dados para perceber se um usuário apresenta algum sinal de problemas de saúde mental. Participámos também, com o nome do grupo Bioinformatics do Instituto de Engenharia Electrónica e Informática da Universidade de Aveiro (BioInfo @ UAVR), na 2ª tarefa partilhada do CLEF eRisk 2021. O eRisk é um desafio online “Early Risk Prediction on the Internet”, cujas tarefas consistem em analisar dados de media e promover pesquisas sobre deteção precoce de doenças mentais. Este ano, o eRisk teve 3 tarefas, cada uma focada num problema de saúde diferente. Este trabalho foca-se em solucionar a 2ª tarefa, cujo objetivo principal é a deteção precoce de utilizadores em risco de automutilação, com base no seu histórico. Este problema foi abordado desenvolvendo modelos de aprendizagem automática supervisionados que podem classificar esses usuários. Nesta abordagem, usamos algoritmos de tokenização baseados em expressões regulares e Yake (que é uma ferramenta de extração de palavras-chave), recursos linguísticos como emojis e outros modelos como BERT e VADER e classificadores como Support Vector Machines ou algoritmos Boosting. Depois de testar todas as combinações de métodos, com diferentes combinações de hiperparâmetros, concluímos que é possível tomar decisões sobre o estado de saúde mental de um usuário com base nesses métodos com latency − weighted igual a 0,46 para o dataset de teste providenciado pelo eRisk 2021.2022-06-27T14:48:46Z2021-12-13T00:00:00Z2021-12-13info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/34066engBarros, Lucas Filipe Roberto deinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:05:39Zoai:ria.ua.pt:10773/34066Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:05:25.743168Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Social monitoring for risk prediction in public forums
title Social monitoring for risk prediction in public forums
spellingShingle Social monitoring for risk prediction in public forums
Barros, Lucas Filipe Roberto de
Social media
Hyperparameters
Self-harm
Machine learning
title_short Social monitoring for risk prediction in public forums
title_full Social monitoring for risk prediction in public forums
title_fullStr Social monitoring for risk prediction in public forums
title_full_unstemmed Social monitoring for risk prediction in public forums
title_sort Social monitoring for risk prediction in public forums
author Barros, Lucas Filipe Roberto de
author_facet Barros, Lucas Filipe Roberto de
author_role author
dc.contributor.author.fl_str_mv Barros, Lucas Filipe Roberto de
dc.subject.por.fl_str_mv Social media
Hyperparameters
Self-harm
Machine learning
topic Social media
Hyperparameters
Self-harm
Machine learning
description In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.
publishDate 2021
dc.date.none.fl_str_mv 2021-12-13T00:00:00Z
2021-12-13
2022-06-27T14:48:46Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/34066
url http://hdl.handle.net/10773/34066
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137709451640832