Statistical Language Models applied to News Generation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://repositorio-aberto.up.pt/handle/10216/106475 |
Resumo: | Natural Language Generation (NLG) is a subfield of Artificial Intelligence. Its main goal is to produce understandable text in natural language, from a non-linguistic data input.Automated News Generation is a promising subject in the area of Computational Journalism, which uses NLG to create tools that helps journalists in the news production, automating some steps. These tools need a large amount of structured data as input and, for this reason, sports is a very natural subject to use, because the data is very well organized. The automatization of steps, in the news production, brings benefits to journalists, namely the tools can summarize data and make it readable instantly. Then they just have to adjust it, making the process of production a lot faster. The need for this agile process was the main motivation of this dissertation. The goal of this dissertation is to implement an Automated News Generation algorithm with the collaboration of ZOS, Lda. who owns the zerozero.pt project, an online social media publisher with one of the largest football databases in the world. They will provide a dataset for exploration and research in this field. This dissertation continues the work done by João Aires, in 2016, when he wrote a dissertation about this same topic. In this dissertation will be used a different approach to address the problem.The primary objective is to use Statistical Language Models to generate news from scratch, applying them to a system where the user can generate sentences about a specific match.Zerozero.pt saves data of more than 6000 matches per week and produces news for an average of 100 games per week. After a manual analysis of a part of that data, was decided that a news piece would be divided in 4 parts: Introduction, Goals, Sent offs and Conclusion. With the creation of Statistical Language Models for each part it is possible to summarize each match, making it easier to use this large amount of structured data and consequently increase the journalist's productivity.The evaluation of the system will be done using manual evaluation such as inquiries. This way, it will be possible to analyze and discuss the obtained results. |
id |
RCAP_8b0e1ea4f9b93cd49c5d729213abc600 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/106475 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Statistical Language Models applied to News GenerationEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringNatural Language Generation (NLG) is a subfield of Artificial Intelligence. Its main goal is to produce understandable text in natural language, from a non-linguistic data input.Automated News Generation is a promising subject in the area of Computational Journalism, which uses NLG to create tools that helps journalists in the news production, automating some steps. These tools need a large amount of structured data as input and, for this reason, sports is a very natural subject to use, because the data is very well organized. The automatization of steps, in the news production, brings benefits to journalists, namely the tools can summarize data and make it readable instantly. Then they just have to adjust it, making the process of production a lot faster. The need for this agile process was the main motivation of this dissertation. The goal of this dissertation is to implement an Automated News Generation algorithm with the collaboration of ZOS, Lda. who owns the zerozero.pt project, an online social media publisher with one of the largest football databases in the world. They will provide a dataset for exploration and research in this field. This dissertation continues the work done by João Aires, in 2016, when he wrote a dissertation about this same topic. In this dissertation will be used a different approach to address the problem.The primary objective is to use Statistical Language Models to generate news from scratch, applying them to a system where the user can generate sentences about a specific match.Zerozero.pt saves data of more than 6000 matches per week and produces news for an average of 100 games per week. After a manual analysis of a part of that data, was decided that a news piece would be divided in 4 parts: Introduction, Goals, Sent offs and Conclusion. With the creation of Statistical Language Models for each part it is possible to summarize each match, making it easier to use this large amount of structured data and consequently increase the journalist's productivity.The evaluation of the system will be done using manual evaluation such as inquiries. This way, it will be possible to analyze and discuss the obtained results.2017-07-072017-07-07T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/106475TID:201799499engJoão Ricardo Pintas Soaresinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:41:12Zoai:repositorio-aberto.up.pt:10216/106475Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:45:41.032258Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Statistical Language Models applied to News Generation |
title |
Statistical Language Models applied to News Generation |
spellingShingle |
Statistical Language Models applied to News Generation João Ricardo Pintas Soares Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Statistical Language Models applied to News Generation |
title_full |
Statistical Language Models applied to News Generation |
title_fullStr |
Statistical Language Models applied to News Generation |
title_full_unstemmed |
Statistical Language Models applied to News Generation |
title_sort |
Statistical Language Models applied to News Generation |
author |
João Ricardo Pintas Soares |
author_facet |
João Ricardo Pintas Soares |
author_role |
author |
dc.contributor.author.fl_str_mv |
João Ricardo Pintas Soares |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
Natural Language Generation (NLG) is a subfield of Artificial Intelligence. Its main goal is to produce understandable text in natural language, from a non-linguistic data input.Automated News Generation is a promising subject in the area of Computational Journalism, which uses NLG to create tools that helps journalists in the news production, automating some steps. These tools need a large amount of structured data as input and, for this reason, sports is a very natural subject to use, because the data is very well organized. The automatization of steps, in the news production, brings benefits to journalists, namely the tools can summarize data and make it readable instantly. Then they just have to adjust it, making the process of production a lot faster. The need for this agile process was the main motivation of this dissertation. The goal of this dissertation is to implement an Automated News Generation algorithm with the collaboration of ZOS, Lda. who owns the zerozero.pt project, an online social media publisher with one of the largest football databases in the world. They will provide a dataset for exploration and research in this field. This dissertation continues the work done by João Aires, in 2016, when he wrote a dissertation about this same topic. In this dissertation will be used a different approach to address the problem.The primary objective is to use Statistical Language Models to generate news from scratch, applying them to a system where the user can generate sentences about a specific match.Zerozero.pt saves data of more than 6000 matches per week and produces news for an average of 100 games per week. After a manual analysis of a part of that data, was decided that a news piece would be divided in 4 parts: Introduction, Goals, Sent offs and Conclusion. With the creation of Statistical Language Models for each part it is possible to summarize each match, making it easier to use this large amount of structured data and consequently increase the journalist's productivity.The evaluation of the system will be done using manual evaluation such as inquiries. This way, it will be possible to analyze and discuss the obtained results. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-07-07 2017-07-07T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/106475 TID:201799499 |
url |
https://repositorio-aberto.up.pt/handle/10216/106475 |
identifier_str_mv |
TID:201799499 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135773776150528 |