The Gulf of Guinea Creole Corpora
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/30690 |
Resumo: | We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface. |
id |
RCAP_bad888890d04b00f4d1222d1eb3ca47b |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/30690 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
The Gulf of Guinea Creole CorporaGulf of Guinea creolesCorpus annotation and managementLanguage documentationWe present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.European Language Resources AssociationRepositório da Universidade de LisboaHagemeijer, TjerkGénéreux, MichelHendrickx, IrisMendes, AmáliaTiny, AbigailZamora, Armando2018-01-17T16:33:59Z20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30690engHagemeijer, Tjerk, Michel Généreux, Iris Hendrickx, Amália Mendes, Abigail Tiny, Armando Zamora (2014) “The Gulf of Guinea Creole Corpora” in Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland, pp. 523-529978-2-9517408-8-4info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30690Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.687550Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
The Gulf of Guinea Creole Corpora |
title |
The Gulf of Guinea Creole Corpora |
spellingShingle |
The Gulf of Guinea Creole Corpora Hagemeijer, Tjerk Gulf of Guinea creoles Corpus annotation and management Language documentation |
title_short |
The Gulf of Guinea Creole Corpora |
title_full |
The Gulf of Guinea Creole Corpora |
title_fullStr |
The Gulf of Guinea Creole Corpora |
title_full_unstemmed |
The Gulf of Guinea Creole Corpora |
title_sort |
The Gulf of Guinea Creole Corpora |
author |
Hagemeijer, Tjerk |
author_facet |
Hagemeijer, Tjerk Généreux, Michel Hendrickx, Iris Mendes, Amália Tiny, Abigail Zamora, Armando |
author_role |
author |
author2 |
Généreux, Michel Hendrickx, Iris Mendes, Amália Tiny, Abigail Zamora, Armando |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Hagemeijer, Tjerk Généreux, Michel Hendrickx, Iris Mendes, Amália Tiny, Abigail Zamora, Armando |
dc.subject.por.fl_str_mv |
Gulf of Guinea creoles Corpus annotation and management Language documentation |
topic |
Gulf of Guinea creoles Corpus annotation and management Language documentation |
description |
We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014 2014-01-01T00:00:00Z 2018-01-17T16:33:59Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/30690 |
url |
http://hdl.handle.net/10451/30690 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Hagemeijer, Tjerk, Michel Généreux, Iris Hendrickx, Amália Mendes, Abigail Tiny, Armando Zamora (2014) “The Gulf of Guinea Creole Corpora” in Proceedings of the Ninth International Conference on Language Resources and Evaluation – LREC’14, May 26-31, Reykjavik, Iceland, pp. 523-529 978-2-9517408-8-4 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
European Language Resources Association |
publisher.none.fl_str_mv |
European Language Resources Association |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134387567067136 |