The COPLE2 Corpus: a Learner Corpus for Portuguese

Detalhes bibliográficos
Autor(a) principal: Mendes, Amália
Data de Publicação: 2016
Outros Autores: Antunes, Sandra, Jansseen, Maarten, Gonçalves, Anabela
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/30692
Resumo: We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.
id RCAP_dfd6459c4f288ce1482aa06d5fd55fff
oai_identifier_str oai:repositorio.ul.pt:10451/30692
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling The COPLE2 Corpus: a Learner Corpus for PortugueseLearner corpusCorpus compilationLanguage learningLanguage teachingWe present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.European Language Resources AssociationRepositório da Universidade de LisboaMendes, AmáliaAntunes, SandraJansseen, MaartenGonçalves, Anabela2018-01-17T16:46:22Z20162016-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30692engMendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214978-2-9517408-9-1info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30692Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.777253Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv The COPLE2 Corpus: a Learner Corpus for Portuguese
title The COPLE2 Corpus: a Learner Corpus for Portuguese
spellingShingle The COPLE2 Corpus: a Learner Corpus for Portuguese
Mendes, Amália
Learner corpus
Corpus compilation
Language learning
Language teaching
title_short The COPLE2 Corpus: a Learner Corpus for Portuguese
title_full The COPLE2 Corpus: a Learner Corpus for Portuguese
title_fullStr The COPLE2 Corpus: a Learner Corpus for Portuguese
title_full_unstemmed The COPLE2 Corpus: a Learner Corpus for Portuguese
title_sort The COPLE2 Corpus: a Learner Corpus for Portuguese
author Mendes, Amália
author_facet Mendes, Amália
Antunes, Sandra
Jansseen, Maarten
Gonçalves, Anabela
author_role author
author2 Antunes, Sandra
Jansseen, Maarten
Gonçalves, Anabela
author2_role author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Mendes, Amália
Antunes, Sandra
Jansseen, Maarten
Gonçalves, Anabela
dc.subject.por.fl_str_mv Learner corpus
Corpus compilation
Language learning
Language teaching
topic Learner corpus
Corpus compilation
Language learning
Language teaching
description We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.
publishDate 2016
dc.date.none.fl_str_mv 2016
2016-01-01T00:00:00Z
2018-01-17T16:46:22Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/30692
url http://hdl.handle.net/10451/30692
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Mendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214
978-2-9517408-9-1
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv European Language Resources Association
publisher.none.fl_str_mv European Language Resources Association
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134387570212864