Error annotation in the COPLE2 corpus
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.26334/2183-9077/rapln4ano2018a42 |
Resumo: | We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning. |
id |
RCAP_bdc8cc0728067adf04b2d8c30a30b7d7 |
---|---|
oai_identifier_str |
oai:ojs3.ojs.apl.pt:article/42 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Error annotation in the COPLE2 corpusAnotação de erros no corpus COPLE2corpus de aprendentesanotação do erroprocessamento de língua naturalaquisição de L2learner corpuserror annotationL2 acquisitionnatural language processingWe present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.Associação Portuguesa de Linguística2018-10-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln4ano2018a42https://doi.org/10.26334/2183-9077/rapln4ano2018a42Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-2392183-9077reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/42https://ojs.apl.pt/index.php/rapl/article/view/42/44Direitos de Autor (c) 2018 Iria del Rio, Amália Mendesinfo:eu-repo/semantics/openAccessdel Rio, IriaMendes, Amália2023-12-09T10:16:09Zoai:ojs3.ojs.apl.pt:article/42Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:35:57.997652Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Error annotation in the COPLE2 corpus Anotação de erros no corpus COPLE2 |
title |
Error annotation in the COPLE2 corpus |
spellingShingle |
Error annotation in the COPLE2 corpus del Rio, Iria corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing |
title_short |
Error annotation in the COPLE2 corpus |
title_full |
Error annotation in the COPLE2 corpus |
title_fullStr |
Error annotation in the COPLE2 corpus |
title_full_unstemmed |
Error annotation in the COPLE2 corpus |
title_sort |
Error annotation in the COPLE2 corpus |
author |
del Rio, Iria |
author_facet |
del Rio, Iria Mendes, Amália |
author_role |
author |
author2 |
Mendes, Amália |
author2_role |
author |
dc.contributor.author.fl_str_mv |
del Rio, Iria Mendes, Amália |
dc.subject.por.fl_str_mv |
corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing |
topic |
corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing |
description |
We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-10-15 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.26334/2183-9077/rapln4ano2018a42 https://doi.org/10.26334/2183-9077/rapln4ano2018a42 |
url |
https://doi.org/10.26334/2183-9077/rapln4ano2018a42 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://ojs.apl.pt/index.php/rapl/article/view/42 https://ojs.apl.pt/index.php/rapl/article/view/42/44 |
dc.rights.driver.fl_str_mv |
Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Associação Portuguesa de Linguística |
publisher.none.fl_str_mv |
Associação Portuguesa de Linguística |
dc.source.none.fl_str_mv |
Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239 Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-239 2183-9077 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133623136288768 |