Computing the accuracy of an automatic system for relevance detection in social networks

Detalhes bibliográficos
Autor(a) principal: Filipe Fernandes Miranda
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/106176
Resumo: To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.
id RCAP_4baae726b64d31b340a007a7a459779d
oai_identifier_str oai:repositorio-aberto.up.pt:10216/106176
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Computing the accuracy of an automatic system for relevance detection in social networksEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringTo correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.2017-07-112017-07-11T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/106176TID:201804425engFilipe Fernandes Mirandainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:28:13Zoai:repositorio-aberto.up.pt:10216/106176Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:20:51.882649Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Computing the accuracy of an automatic system for relevance detection in social networks
title Computing the accuracy of an automatic system for relevance detection in social networks
spellingShingle Computing the accuracy of an automatic system for relevance detection in social networks
Filipe Fernandes Miranda
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Computing the accuracy of an automatic system for relevance detection in social networks
title_full Computing the accuracy of an automatic system for relevance detection in social networks
title_fullStr Computing the accuracy of an automatic system for relevance detection in social networks
title_full_unstemmed Computing the accuracy of an automatic system for relevance detection in social networks
title_sort Computing the accuracy of an automatic system for relevance detection in social networks
author Filipe Fernandes Miranda
author_facet Filipe Fernandes Miranda
author_role author
dc.contributor.author.fl_str_mv Filipe Fernandes Miranda
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.
publishDate 2017
dc.date.none.fl_str_mv 2017-07-11
2017-07-11T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/106176
TID:201804425
url https://hdl.handle.net/10216/106176
identifier_str_mv TID:201804425
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135508513685504