Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Outros Autores: | , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/31093 |
Resumo: | This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains. |
id |
RCAP_f16fc5e0b4a598f50a27691f77c37027 |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/31093 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Revising the Annotation of a Broadcast News Corpus: a Linguistic ApproachSpeech annotationMetadataBroadcast newsThis paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.European Language Resources Association (ELRA)Repositório da Universidade de LisboaCabarrão, VeraMoniz, HelenaBatista, FernandoRibeiro, RicardoMamede, NunoMeinedo, HugoTrancoso, IsabelMata, Ana IsabelMatos, David2018-01-28T15:43:22Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/31093engCabarrão, V., Moniz, H., Batista, F., Ribeiro, R., Mamede, N., Meinedo, H., Trancoso, I., Mata, A. I. & de Matos, D. (2014) Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach, in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3908-3913.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:24:16Zoai:repositorio.ul.pt:10451/31093Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:37.307271Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
title |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
spellingShingle |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach Cabarrão, Vera Speech annotation Metadata Broadcast news |
title_short |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
title_full |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
title_fullStr |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
title_full_unstemmed |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
title_sort |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
author |
Cabarrão, Vera |
author_facet |
Cabarrão, Vera Moniz, Helena Batista, Fernando Ribeiro, Ricardo Mamede, Nuno Meinedo, Hugo Trancoso, Isabel Mata, Ana Isabel Matos, David |
author_role |
author |
author2 |
Moniz, Helena Batista, Fernando Ribeiro, Ricardo Mamede, Nuno Meinedo, Hugo Trancoso, Isabel Mata, Ana Isabel Matos, David |
author2_role |
author author author author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Cabarrão, Vera Moniz, Helena Batista, Fernando Ribeiro, Ricardo Mamede, Nuno Meinedo, Hugo Trancoso, Isabel Mata, Ana Isabel Matos, David |
dc.subject.por.fl_str_mv |
Speech annotation Metadata Broadcast news |
topic |
Speech annotation Metadata Broadcast news |
description |
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014 2014-01-01T00:00:00Z 2018-01-28T15:43:22Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/31093 |
url |
http://hdl.handle.net/10451/31093 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Cabarrão, V., Moniz, H., Batista, F., Ribeiro, R., Mamede, N., Meinedo, H., Trancoso, I., Mata, A. I. & de Matos, D. (2014) Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach, in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), 3908-3913. |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
European Language Resources Association (ELRA) |
publisher.none.fl_str_mv |
European Language Resources Association (ELRA) |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134391194091520 |