The COPLE2 Corpus: a Learner Corpus for Portuguese
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/30692 |
Resumo: | We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level. |
id |
RCAP_dfd6459c4f288ce1482aa06d5fd55fff |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/30692 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
The COPLE2 Corpus: a Learner Corpus for PortugueseLearner corpusCorpus compilationLanguage learningLanguage teachingWe present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.European Language Resources AssociationRepositório da Universidade de LisboaMendes, AmáliaAntunes, SandraJansseen, MaartenGonçalves, Anabela2018-01-17T16:46:22Z20162016-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30692engMendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214978-2-9517408-9-1info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30692Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.777253Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
title |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
spellingShingle |
The COPLE2 Corpus: a Learner Corpus for Portuguese Mendes, Amália Learner corpus Corpus compilation Language learning Language teaching |
title_short |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
title_full |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
title_fullStr |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
title_full_unstemmed |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
title_sort |
The COPLE2 Corpus: a Learner Corpus for Portuguese |
author |
Mendes, Amália |
author_facet |
Mendes, Amália Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela |
author_role |
author |
author2 |
Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Mendes, Amália Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela |
dc.subject.por.fl_str_mv |
Learner corpus Corpus compilation Language learning Language teaching |
topic |
Learner corpus Corpus compilation Language learning Language teaching |
description |
We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016 2016-01-01T00:00:00Z 2018-01-17T16:46:22Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/30692 |
url |
http://hdl.handle.net/10451/30692 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Mendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214 978-2-9517408-9-1 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
European Language Resources Association |
publisher.none.fl_str_mv |
European Language Resources Association |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134387570212864 |