Caraterização de um corpus jornalístico português
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/83538 |
Resumo: | In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments. |
id |
RCAP_58405152b48ac0fc10f819f1b1d06a02 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/83538 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Caraterização de um corpus jornalístico portuguêsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIn this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments.2015-07-202015-07-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/83538TID:201304805porHenrique Manuel Martins Moreira Teixeira de Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:19:37Zoai:repositorio-aberto.up.pt:10216/83538Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:20:44.788633Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Caraterização de um corpus jornalístico português |
title |
Caraterização de um corpus jornalístico português |
spellingShingle |
Caraterização de um corpus jornalístico português Henrique Manuel Martins Moreira Teixeira de Sousa Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Caraterização de um corpus jornalístico português |
title_full |
Caraterização de um corpus jornalístico português |
title_fullStr |
Caraterização de um corpus jornalístico português |
title_full_unstemmed |
Caraterização de um corpus jornalístico português |
title_sort |
Caraterização de um corpus jornalístico português |
author |
Henrique Manuel Martins Moreira Teixeira de Sousa |
author_facet |
Henrique Manuel Martins Moreira Teixeira de Sousa |
author_role |
author |
dc.contributor.author.fl_str_mv |
Henrique Manuel Martins Moreira Teixeira de Sousa |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-07-20 2015-07-20T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/83538 TID:201304805 |
url |
https://hdl.handle.net/10216/83538 |
identifier_str_mv |
TID:201304805 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136123963834368 |