Social monitoring for risk prediction in public forums

Barros, Lucas Filipe Roberto de

Social monitoring for risk prediction in public forums

Detalhes bibliográficos
Autor(a) principal:	Barros, Lucas Filipe Roberto de
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/34066
Resumo:	In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.

Metadados do item

id	RCAP_ea3281de94450ae1496bad87ee47e7e5
oai_identifier_str	oai:ria.ua.pt:10773/34066
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Social monitoring for risk prediction in public forumsSocial mediaHyperparametersSelf-harmMachine learningIn the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.Nos últimos anos, as redes sociais têm mudado os métodos de comunicação com que as pessoas se exprimem. Uma grande diferença entre este novo mecanismo de comunicação e os métodos tradicionais é que estes novos dados são documentados. Por conseguinte, é importante entender como todos esses dados podem ser usados para mudar de forma positiva as vidas das pessoas. O tema principal desta tese é usar esses dados para perceber se um usuário apresenta algum sinal de problemas de saúde mental. Participámos também, com o nome do grupo Bioinformatics do Instituto de Engenharia Electrónica e Informática da Universidade de Aveiro (BioInfo @ UAVR), na 2ª tarefa partilhada do CLEF eRisk 2021. O eRisk é um desafio online “Early Risk Prediction on the Internet”, cujas tarefas consistem em analisar dados de media e promover pesquisas sobre deteção precoce de doenças mentais. Este ano, o eRisk teve 3 tarefas, cada uma focada num problema de saúde diferente. Este trabalho foca-se em solucionar a 2ª tarefa, cujo objetivo principal é a deteção precoce de utilizadores em risco de automutilação, com base no seu histórico. Este problema foi abordado desenvolvendo modelos de aprendizagem automática supervisionados que podem classificar esses usuários. Nesta abordagem, usamos algoritmos de tokenização baseados em expressões regulares e Yake (que é uma ferramenta de extração de palavras-chave), recursos linguísticos como emojis e outros modelos como BERT e VADER e classificadores como Support Vector Machines ou algoritmos Boosting. Depois de testar todas as combinações de métodos, com diferentes combinações de hiperparâmetros, concluímos que é possível tomar decisões sobre o estado de saúde mental de um usuário com base nesses métodos com latency − weighted igual a 0,46 para o dataset de teste providenciado pelo eRisk 2021.2022-06-27T14:48:46Z2021-12-13T00:00:00Z2021-12-13info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/34066engBarros, Lucas Filipe Roberto deinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:05:39Zoai:ria.ua.pt:10773/34066Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:05:25.743168Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Social monitoring for risk prediction in public forums
title	Social monitoring for risk prediction in public forums
spellingShingle	Social monitoring for risk prediction in public forums Barros, Lucas Filipe Roberto de Social media Hyperparameters Self-harm Machine learning
title_short	Social monitoring for risk prediction in public forums
title_full	Social monitoring for risk prediction in public forums
title_fullStr	Social monitoring for risk prediction in public forums
title_full_unstemmed	Social monitoring for risk prediction in public forums
title_sort	Social monitoring for risk prediction in public forums
author	Barros, Lucas Filipe Roberto de
author_facet	Barros, Lucas Filipe Roberto de
author_role	author
dc.contributor.author.fl_str_mv	Barros, Lucas Filipe Roberto de
dc.subject.por.fl_str_mv	Social media Hyperparameters Self-harm Machine learning
topic	Social media Hyperparameters Self-harm Machine learning
description	In the last couple of years, Social media has been changing the methods people use to communicate and express themselves. One big difference between this new communication mechanism and the traditional methods is that this data is documented. Hence, it is important to understand how all of this data can be used to improve our lives. This thesis main subject is to use this data to perceive if a user shows any signs of mental health problems. We also participated, with the group named Bioinformatics of the Institute of Electronics and Computer Engineering of University of Aveiro (BioInfo@UAVR), in the 2nd shared task of CLEF eRisk 2021. eRisk is an “Early Risk Prediction on the Internet” online challenge whose tasks consist in analysing social media data and foster research on early detection of mental disorders. This year eRisk had 3 tasks, each focusing on a different disorder. This paper focuses on addressing the 2nd task, whose main objective is the early detection of users at risk of self-harming, based on their history. This issue was addressed by developing supervised machine learning models that can classify such users. In this approach, we used Tokenization algorithms based on regular expressions and Yake (which is a keyword extractor tool), linguistic features such as emojis, and other models such as BERT embeddings and VADER sentiment score and machine learning classifiers such as Support Vector Machines or Boosting classifiers. After testing all combinations of methods, with different combinations of hyperparameters, we conclude that it is possible to make decisions regarding a user’s mental health state based on these methods with latency − weighted score of 0.46 for the eRisk 2021 testset.
publishDate	2021
dc.date.none.fl_str_mv	2021-12-13T00:00:00Z 2021-12-13 2022-06-27T14:48:46Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/34066
url	http://hdl.handle.net/10773/34066
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137709451640832

Social monitoring for risk prediction in public forums

Registros relacionados