Caraterização de um corpus jornalístico português

Detalhes bibliográficos
Autor(a) principal: Henrique Manuel Martins Moreira Teixeira de Sousa
Data de Publicação: 2015
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/83538
Resumo: In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments.
id RCAP_58405152b48ac0fc10f819f1b1d06a02
oai_identifier_str oai:repositorio-aberto.up.pt:10216/83538
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Caraterização de um corpus jornalístico portuguêsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIn this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments.2015-07-202015-07-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/83538TID:201304805porHenrique Manuel Martins Moreira Teixeira de Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:19:37Zoai:repositorio-aberto.up.pt:10216/83538Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:20:44.788633Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Caraterização de um corpus jornalístico português
title Caraterização de um corpus jornalístico português
spellingShingle Caraterização de um corpus jornalístico português
Henrique Manuel Martins Moreira Teixeira de Sousa
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Caraterização de um corpus jornalístico português
title_full Caraterização de um corpus jornalístico português
title_fullStr Caraterização de um corpus jornalístico português
title_full_unstemmed Caraterização de um corpus jornalístico português
title_sort Caraterização de um corpus jornalístico português
author Henrique Manuel Martins Moreira Teixeira de Sousa
author_facet Henrique Manuel Martins Moreira Teixeira de Sousa
author_role author
dc.contributor.author.fl_str_mv Henrique Manuel Martins Moreira Teixeira de Sousa
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments.
publishDate 2015
dc.date.none.fl_str_mv 2015-07-20
2015-07-20T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/83538
TID:201304805
url https://hdl.handle.net/10216/83538
identifier_str_mv TID:201304805
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136123963834368