The Gulf of Guinea Creole Corpora

Detalhes bibliográficos
Autor(a) principal: Hagemeijer, Tjerk
Data de Publicação: 2014
Outros Autores: Généreux, Michel, Hendrickx, Iris, Mendes, Amália, Tiny, Abigail, Zamora, Armando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/30690
Resumo: We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.
id RCAP_bad888890d04b00f4d1222d1eb3ca47b
oai_identifier_str oai:repositorio.ul.pt:10451/30690
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling The Gulf of Guinea Creole CorporaGulf of Guinea creolesCorpus annotation and managementLanguage documentationWe present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.European Language Resources AssociationRepositório da Universidade de LisboaHagemeijer, TjerkGénéreux, MichelHendrickx, IrisMendes, AmáliaTiny, AbigailZamora, Armando2018-01-17T16:33:59Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30690engHagemeijer, Tjerk, Michel Généreux, Iris Hendrickx, Amália Mendes, Abigail Tiny, Armando Zamora (2014) “The Gulf of Guinea Creole Corpora” in Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland, pp. 523-529978-2-9517408-8-4info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30690Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.687550Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv The Gulf of Guinea Creole Corpora
title The Gulf of Guinea Creole Corpora
spellingShingle The Gulf of Guinea Creole Corpora
Hagemeijer, Tjerk
Gulf of Guinea creoles
Corpus annotation and management
Language documentation
title_short The Gulf of Guinea Creole Corpora
title_full The Gulf of Guinea Creole Corpora
title_fullStr The Gulf of Guinea Creole Corpora
title_full_unstemmed The Gulf of Guinea Creole Corpora
title_sort The Gulf of Guinea Creole Corpora
author Hagemeijer, Tjerk
author_facet Hagemeijer, Tjerk
Généreux, Michel
Hendrickx, Iris
Mendes, Amália
Tiny, Abigail
Zamora, Armando
author_role author
author2 Généreux, Michel
Hendrickx, Iris
Mendes, Amália
Tiny, Abigail
Zamora, Armando
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Hagemeijer, Tjerk
Généreux, Michel
Hendrickx, Iris
Mendes, Amália
Tiny, Abigail
Zamora, Armando
dc.subject.por.fl_str_mv Gulf of Guinea creoles
Corpus annotation and management
Language documentation
topic Gulf of Guinea creoles
Corpus annotation and management
Language documentation
description We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.
publishDate 2014
dc.date.none.fl_str_mv 2014
2014-01-01T00:00:00Z
2018-01-17T16:33:59Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/30690
url http://hdl.handle.net/10451/30690
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Hagemeijer, Tjerk, Michel Généreux, Iris Hendrickx, Amália Mendes, Abigail Tiny, Armando Zamora (2014) “The Gulf of Guinea Creole Corpora” in Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland, pp. 523-529
978-2-9517408-8-4
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv European Language Resources Association
publisher.none.fl_str_mv European Language Resources Association
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134387567067136