Computing the accuracy of an automatic system for relevance detection in social networks
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/106176 |
Resumo: | To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events. |
id |
RCAP_4baae726b64d31b340a007a7a459779d |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/106176 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Computing the accuracy of an automatic system for relevance detection in social networksEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringTo correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events.2017-07-112017-07-11T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/106176TID:201804425engFilipe Fernandes Mirandainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:28:13Zoai:repositorio-aberto.up.pt:10216/106176Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:20:51.882649Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Computing the accuracy of an automatic system for relevance detection in social networks |
title |
Computing the accuracy of an automatic system for relevance detection in social networks |
spellingShingle |
Computing the accuracy of an automatic system for relevance detection in social networks Filipe Fernandes Miranda Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Computing the accuracy of an automatic system for relevance detection in social networks |
title_full |
Computing the accuracy of an automatic system for relevance detection in social networks |
title_fullStr |
Computing the accuracy of an automatic system for relevance detection in social networks |
title_full_unstemmed |
Computing the accuracy of an automatic system for relevance detection in social networks |
title_sort |
Computing the accuracy of an automatic system for relevance detection in social networks |
author |
Filipe Fernandes Miranda |
author_facet |
Filipe Fernandes Miranda |
author_role |
author |
dc.contributor.author.fl_str_mv |
Filipe Fernandes Miranda |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
To correctly assert the precision of a classification model, previously labeled data is needed to validate the output provided by the model. The process of labeling data can be achieved either by a human manual effort or, automatically, by computers. In this dissertation, an automatic system was designed and created to assess the precision of a classification model with no human component is used throughout the process of labeling the data. The goal of the classification model, used as the basis of this project, is to identify newsworthy social network messages. The model takes advantage of the vast information spread across social networks and aims to filter relevant data, which may have important information from a journalistic point of view. To assert the precision of the classification model, social network messages need to be labeled as news-worthy or not, which can be achieved by manual labeling. While this assessment is fundamental to train the model at a first stage, the monetary, time and precision costs involved do not allow this procedure to be done regularly. Yet, the classification of data is essential to train our models and to determine their accuracy. For this reason, and to avoid the downsides of manual labeling, a four stage automatic system was created. This new approach starts with the collection of data, both messages and news articles. The collected messages will be classified based on the news articles also gathered. The second step is the information extraction. Here, the system will analyze the information present in the different texts, using several information extraction techniques, such as named entity recognition and keywords detection. These results are presented in a standardized vector of features for the messages and news. The third stage is the matching of news and social media messages, based on the similarity of contents. When a message is associated with the content of a news article, it is labeled as news related. This final part, message classification, allows the distinction of news relevant and not relevant messages. This process is also assisted by a filtering model, which helps exclude weak matches. These are cases where even though messages and news have similar information, it is not relevant or newsworthy. The matching method was validated while it was being developed. In the end, the final system has a precision of over 80% in labeling newsworthy social network messages. Nonetheless, techniques and mechanisms developed in this dissertation can be extrapolated for other uses within the media and journalism world. As an example, the research can be targeted at finding possible contradictory information in social network messages, potentially helping news entities to update their stories as live information comes through. Another application might be to detect breaking news and crisis events. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-07-11 2017-07-11T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/106176 TID:201804425 |
url |
https://hdl.handle.net/10216/106176 |
identifier_str_mv |
TID:201804425 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135508513685504 |