Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach

Detalhes bibliográficos
Autor(a) principal: Cabarrão, Vera
Data de Publicação: 2014
Outros Autores: Moniz, Helena, Batista, Fernando, Ribeiro, Ricardo, Mamede, Nuno, Meinedo, Hugo, Trancoso, Isabel, Mata, Ana Isabel, Matos, David
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/31093
Resumo: This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.
id RCAP_f16fc5e0b4a598f50a27691f77c37027
oai_identifier_str oai:repositorio.ul.pt:10451/31093
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Revising the Annotation of a Broadcast News Corpus: a Linguistic ApproachSpeech annotationMetadataBroadcast newsThis paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.European Language Resources Association (ELRA)Repositório da Universidade de LisboaCabarrão, VeraMoniz, HelenaBatista, FernandoRibeiro, RicardoMamede, NunoMeinedo, HugoTrancoso, IsabelMata, Ana IsabelMatos, David2018-01-28T15:43:22Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/31093engCabarrão, V., Moniz, H., Batista, F., Ribeiro, R., Mamede, N., Meinedo, H., Trancoso, I., Mata, A. I. & de Matos, D. (2014) Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach, in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3908-3913.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:24:16Zoai:repositorio.ul.pt:10451/31093Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:37.307271Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
title Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
spellingShingle Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
Cabarrão, Vera
Speech annotation
Metadata
Broadcast news
title_short Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
title_full Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
title_fullStr Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
title_full_unstemmed Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
title_sort Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
author Cabarrão, Vera
author_facet Cabarrão, Vera
Moniz, Helena
Batista, Fernando
Ribeiro, Ricardo
Mamede, Nuno
Meinedo, Hugo
Trancoso, Isabel
Mata, Ana Isabel
Matos, David
author_role author
author2 Moniz, Helena
Batista, Fernando
Ribeiro, Ricardo
Mamede, Nuno
Meinedo, Hugo
Trancoso, Isabel
Mata, Ana Isabel
Matos, David
author2_role author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Cabarrão, Vera
Moniz, Helena
Batista, Fernando
Ribeiro, Ricardo
Mamede, Nuno
Meinedo, Hugo
Trancoso, Isabel
Mata, Ana Isabel
Matos, David
dc.subject.por.fl_str_mv Speech annotation
Metadata
Broadcast news
topic Speech annotation
Metadata
Broadcast news
description This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.
publishDate 2014
dc.date.none.fl_str_mv 2014
2014-01-01T00:00:00Z
2018-01-28T15:43:22Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/31093
url http://hdl.handle.net/10451/31093
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Cabarrão, V., Moniz, H., Batista, F., Ribeiro, R., Mamede, N., Meinedo, H., Trancoso, I., Mata, A. I. & de Matos, D. (2014) Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach, in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3908-3913.
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv European Language Resources Association (ELRA)
publisher.none.fl_str_mv European Language Resources Association (ELRA)
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134391194091520