Error annotation in the COPLE2 corpus

Detalhes bibliográficos
Autor(a) principal: del Rio, Iria
Data de Publicação: 2018
Outros Autores: Mendes, Amália
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.26334/2183-9077/rapln4ano2018a42
Resumo: We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
id RCAP_bdc8cc0728067adf04b2d8c30a30b7d7
oai_identifier_str oai:ojs3.ojs.apl.pt:article/42
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Error annotation in the COPLE2 corpusAnotação de erros no corpus COPLE2corpus de aprendentesanotação do erroprocessamento de língua naturalaquisição de L2learner corpuserror annotationL2 acquisitionnatural language processingWe present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.Associação Portuguesa de Linguística2018-10-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln4ano2018a42https://doi.org/10.26334/2183-9077/rapln4ano2018a42Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-2392183-9077reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/42https://ojs.apl.pt/index.php/rapl/article/view/42/44Direitos de Autor (c) 2018 Iria del Rio, Amália Mendesinfo:eu-repo/semantics/openAccessdel Rio, IriaMendes, Amália2023-12-09T10:16:09Zoai:ojs3.ojs.apl.pt:article/42Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:35:57.997652Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Error annotation in the COPLE2 corpus
Anotação de erros no corpus COPLE2
title Error annotation in the COPLE2 corpus
spellingShingle Error annotation in the COPLE2 corpus
del Rio, Iria
corpus de aprendentes
anotação do erro
processamento de língua natural
aquisição de L2
learner corpus
error annotation
L2 acquisition
natural language processing
title_short Error annotation in the COPLE2 corpus
title_full Error annotation in the COPLE2 corpus
title_fullStr Error annotation in the COPLE2 corpus
title_full_unstemmed Error annotation in the COPLE2 corpus
title_sort Error annotation in the COPLE2 corpus
author del Rio, Iria
author_facet del Rio, Iria
Mendes, Amália
author_role author
author2 Mendes, Amália
author2_role author
dc.contributor.author.fl_str_mv del Rio, Iria
Mendes, Amália
dc.subject.por.fl_str_mv corpus de aprendentes
anotação do erro
processamento de língua natural
aquisição de L2
learner corpus
error annotation
L2 acquisition
natural language processing
topic corpus de aprendentes
anotação do erro
processamento de língua natural
aquisição de L2
learner corpus
error annotation
L2 acquisition
natural language processing
description We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
publishDate 2018
dc.date.none.fl_str_mv 2018-10-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.26334/2183-9077/rapln4ano2018a42
https://doi.org/10.26334/2183-9077/rapln4ano2018a42
url https://doi.org/10.26334/2183-9077/rapln4ano2018a42
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://ojs.apl.pt/index.php/rapl/article/view/42
https://ojs.apl.pt/index.php/rapl/article/view/42/44
dc.rights.driver.fl_str_mv Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Associação Portuguesa de Linguística
publisher.none.fl_str_mv Associação Portuguesa de Linguística
dc.source.none.fl_str_mv Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239
Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-239
2183-9077
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133623136288768