Error annotation in the COPLE2 corpus

del Río, Iria; Mendes, Amália

Error annotation in the COPLE2 corpus

Detalhes bibliográficos
Autor(a) principal:	del Río, Iria
Data de Publicação:	2018
Outros Autores:	Mendes, Amália
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10451/36512
Resumo:	We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.

Metadados do item

id	RCAP_bc6dac5b203614532624cbf42f2a3215
oai_identifier_str	oai:repositorio.ul.pt:10451/36512
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Error annotation in the COPLE2 corpusLearner corpusError annotationSecond language acquisitionNatural language processingCorpus de aprendentesAnotação do erroAquisição de língua segundaProcessamento de língua naturalWe present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.Associação Portuguesa de LinguísticaRepositório da Universidade de Lisboadel Río, IriaMendes, Amália2019-01-18T10:11:44Z2018-09-222018-09-22T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/36512engdel Río, I., & Mendes, A. (2018). Error annotation in the COPLE2 corpus. Revista Da Associação Portuguesa De Linguística, (4), 225-239. https://doi.org/10.26334/2183-9077/rapln4ano2018a422183-9077https://doi.org/10.26334/2183-9077/rapln4ano2018a42info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:33:00Zoai:repositorio.ul.pt:10451/36512Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:50:39.413526Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Error annotation in the COPLE2 corpus
title	Error annotation in the COPLE2 corpus
spellingShingle	Error annotation in the COPLE2 corpus del Río, Iria Learner corpus Error annotation Second language acquisition Natural language processing Corpus de aprendentes Anotação do erro Aquisição de língua segunda Processamento de língua natural
title_short	Error annotation in the COPLE2 corpus
title_full	Error annotation in the COPLE2 corpus
title_fullStr	Error annotation in the COPLE2 corpus
title_full_unstemmed	Error annotation in the COPLE2 corpus
title_sort	Error annotation in the COPLE2 corpus
author	del Río, Iria
author_facet	del Río, Iria Mendes, Amália
author_role	author
author2	Mendes, Amália
author2_role	author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	del Río, Iria Mendes, Amália
dc.subject.por.fl_str_mv	Learner corpus Error annotation Second language acquisition Natural language processing Corpus de aprendentes Anotação do erro Aquisição de língua segunda Processamento de língua natural
topic	Learner corpus Error annotation Second language acquisition Natural language processing Corpus de aprendentes Anotação do erro Aquisição de língua segunda Processamento de língua natural
description	We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
publishDate	2018
dc.date.none.fl_str_mv	2018-09-22 2018-09-22T00:00:00Z 2019-01-18T10:11:44Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/36512
url	http://hdl.handle.net/10451/36512
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	del Río, I., & Mendes, A. (2018). Error annotation in the COPLE2 corpus. Revista Da Associação Portuguesa De Linguística, (4), 225-239. https://doi.org/10.26334/2183-9077/rapln4ano2018a42 2183-9077 https://doi.org/10.26334/2183-9077/rapln4ano2018a42
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Associação Portuguesa de Linguística
publisher.none.fl_str_mv	Associação Portuguesa de Linguística
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134441783689216

Error annotation in the COPLE2 corpus

Registros relacionados