The COPLE2 Corpus: a Learner Corpus for Portuguese

Mendes, Amália; Antunes, Sandra; Jansseen, Maarten; Gonçalves, Anabela

The COPLE2 Corpus: a Learner Corpus for Portuguese

Detalhes bibliográficos
Autor(a) principal:	Mendes, Amália
Data de Publicação:	2016
Outros Autores:	Antunes, Sandra, Jansseen, Maarten, Gonçalves, Anabela
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10451/30692
Resumo:	We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.

Metadados do item

id	RCAP_dfd6459c4f288ce1482aa06d5fd55fff
oai_identifier_str	oai:repositorio.ul.pt:10451/30692
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	The COPLE2 Corpus: a Learner Corpus for PortugueseLearner corpusCorpus compilationLanguage learningLanguage teachingWe present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.European Language Resources AssociationRepositório da Universidade de LisboaMendes, AmáliaAntunes, SandraJansseen, MaartenGonçalves, Anabela2018-01-17T16:46:22Z20162016-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/30692engMendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214978-2-9517408-9-1info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:23:35Zoai:repositorio.ul.pt:10451/30692Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:46:18.777253Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	The COPLE2 Corpus: a Learner Corpus for Portuguese
title	The COPLE2 Corpus: a Learner Corpus for Portuguese
spellingShingle	The COPLE2 Corpus: a Learner Corpus for Portuguese Mendes, Amália Learner corpus Corpus compilation Language learning Language teaching
title_short	The COPLE2 Corpus: a Learner Corpus for Portuguese
title_full	The COPLE2 Corpus: a Learner Corpus for Portuguese
title_fullStr	The COPLE2 Corpus: a Learner Corpus for Portuguese
title_full_unstemmed	The COPLE2 Corpus: a Learner Corpus for Portuguese
title_sort	The COPLE2 Corpus: a Learner Corpus for Portuguese
author	Mendes, Amália
author_facet	Mendes, Amália Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela
author_role	author
author2	Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela
author2_role	author author author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Mendes, Amália Antunes, Sandra Jansseen, Maarten Gonçalves, Anabela
dc.subject.por.fl_str_mv	Learner corpus Corpus compilation Language learning Language teaching
topic	Learner corpus Corpus compilation Language learning Language teaching
description	We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.
publishDate	2016
dc.date.none.fl_str_mv	2016 2016-01-01T00:00:00Z 2018-01-17T16:46:22Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/30692
url	http://hdl.handle.net/10451/30692
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Mendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214 978-2-9517408-9-1
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	European Language Resources Association
publisher.none.fl_str_mv	European Language Resources Association
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134387570212864

The COPLE2 Corpus: a Learner Corpus for Portuguese

Registros relacionados