Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course

Detalhes bibliográficos
Autor(a) principal: Da Corte, Miguel
Data de Publicação: 2023
Outros Autores: Baptista, Jorge
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.1/20105
Resumo: The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency.
id RCAP_51a59a239e63d83516caf7236cb5d6e1
oai_identifier_str oai:sapientia.ualg.pt:10400.1/20105
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental courseMultiword expressionsLanguage proficiencyClassification levelMachinelearning modelsDevelopmental education courses (in Spanish)The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency.Univerzita Palackého v OlomouciSapientiaDa Corte, MiguelBaptista, Jorge2023-10-30T11:13:49Z2023-092023-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.1/20105eng10.5507/ro.2023.0032571-0966info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-01T02:00:39Zoai:sapientia.ualg.pt:10400.1/20105Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:15.873780Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
title Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
spellingShingle Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
Da Corte, Miguel
Multiword expressions
Language proficiency
Classification level
Machinelearning models
Developmental education courses (in Spanish)
title_short Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
title_full Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
title_fullStr Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
title_full_unstemmed Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
title_sort Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
author Da Corte, Miguel
author_facet Da Corte, Miguel
Baptista, Jorge
author_role author
author2 Baptista, Jorge
author2_role author
dc.contributor.none.fl_str_mv Sapientia
dc.contributor.author.fl_str_mv Da Corte, Miguel
Baptista, Jorge
dc.subject.por.fl_str_mv Multiword expressions
Language proficiency
Classification level
Machinelearning models
Developmental education courses (in Spanish)
topic Multiword expressions
Language proficiency
Classification level
Machinelearning models
Developmental education courses (in Spanish)
description The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency.
publishDate 2023
dc.date.none.fl_str_mv 2023-10-30T11:13:49Z
2023-09
2023-09-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.1/20105
url http://hdl.handle.net/10400.1/20105
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.5507/ro.2023.003
2571-0966
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Univerzita Palackého v Olomouci
publisher.none.fl_str_mv Univerzita Palackého v Olomouci
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134144646610944