Computing the accuracy of an automatic system for relevance detection in social networks

Filipe Fernandes Miranda

Computing the accuracy of an automatic system for relevance detection in social networks

Detalhes bibliográficos
Autor(a) principal:	Filipe Fernandes Miranda
Data de Publicação:	2017
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/106176
Resumo:	To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.

Metadados do item

id	RCAP_4baae726b64d31b340a007a7a459779d
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/106176
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Computing the accuracy of an automatic system for relevance detection in social networksEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringTo correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.2017-07-112017-07-11T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/106176TID:201804425engFilipe Fernandes Mirandainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:28:13Zoai:repositorio-aberto.up.pt:10216/106176Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:20:51.882649Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Computing the accuracy of an automatic system for relevance detection in social networks
title	Computing the accuracy of an automatic system for relevance detection in social networks
spellingShingle	Computing the accuracy of an automatic system for relevance detection in social networks Filipe Fernandes Miranda Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
title_short	Computing the accuracy of an automatic system for relevance detection in social networks
title_full	Computing the accuracy of an automatic system for relevance detection in social networks
title_fullStr	Computing the accuracy of an automatic system for relevance detection in social networks
title_full_unstemmed	Computing the accuracy of an automatic system for relevance detection in social networks
title_sort	Computing the accuracy of an automatic system for relevance detection in social networks
author	Filipe Fernandes Miranda
author_facet	Filipe Fernandes Miranda
author_role	author
dc.contributor.author.fl_str_mv	Filipe Fernandes Miranda
dc.subject.por.fl_str_mv	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
topic	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
description	To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.
publishDate	2017
dc.date.none.fl_str_mv	2017-07-11 2017-07-11T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/106176 TID:201804425
url	https://hdl.handle.net/10216/106176
identifier_str_mv	TID:201804425
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135508513685504

Computing the accuracy of an automatic system for relevance detection in social networks

Registros relacionados