Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.1/20105 |
Resumo: | The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency. |
id |
RCAP_51a59a239e63d83516caf7236cb5d6e1 |
---|---|
oai_identifier_str |
oai:sapientia.ualg.pt:10400.1/20105 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental courseMultiword expressionsLanguage proficiencyClassification levelMachinelearning modelsDevelopmental education courses (in Spanish)The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency.Univerzita Palackého v OlomouciSapientiaDa Corte, MiguelBaptista, Jorge2023-10-30T11:13:49Z2023-092023-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.1/20105eng10.5507/ro.2023.0032571-0966info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-01T02:00:39Zoai:sapientia.ualg.pt:10400.1/20105Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:15.873780Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
title |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
spellingShingle |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course Da Corte, Miguel Multiword expressions Language proficiency Classification level Machinelearning models Developmental education courses (in Spanish) |
title_short |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
title_full |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
title_fullStr |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
title_full_unstemmed |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
title_sort |
Multiword expression tagging of Spanish native and non-native speakers' written essays in a grammar and composition developmental course |
author |
Da Corte, Miguel |
author_facet |
Da Corte, Miguel Baptista, Jorge |
author_role |
author |
author2 |
Baptista, Jorge |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Sapientia |
dc.contributor.author.fl_str_mv |
Da Corte, Miguel Baptista, Jorge |
dc.subject.por.fl_str_mv |
Multiword expressions Language proficiency Classification level Machinelearning models Developmental education courses (in Spanish) |
topic |
Multiword expressions Language proficiency Classification level Machinelearning models Developmental education courses (in Spanish) |
description |
The literature on second language learning posits that there are significant differences between the use of multiword expressions (MWE) by native speakers (NS) and non-native speakers (NNS). Furthermore, it considers that levels of language proficiency can be estimated on the basis of the use of these expressions. This paper analyses the written production from a corpus of essays written by native (16 essays, 5839 words) and non- native Spanish speakers (25 essays, 7767 words) enrolled in a course focused on the development of orthographic, grammatical, lexical, semantic, and discursive skills in Spanish. This is a required course for students pursuing a certification in Translating or Interpreting (Spanish/English) in the educational setting where the study took place. The corpus was manually tagged by two linguists. The classification scheme used was inspired by other schemes found in the literature and built for similar purposes. The results show that, in general, the distribution of MWE types found in the NS and NNS partition of the corpus was not very different (Pearson correlation: 0.894). However, interesting differences were found between the categories of verbal idioms and noun constructions. Though the corpus is too small for more significant conclusions to be drawn, it is possible to point out that different types of MWE are unevenly distributed among the native speakers' and non-native learners' written production material, and some categories may be a clearer indicator of near-native-speaker proficiency. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-10-30T11:13:49Z 2023-09 2023-09-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.1/20105 |
url |
http://hdl.handle.net/10400.1/20105 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.5507/ro.2023.003 2571-0966 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Univerzita Palackého v Olomouci |
publisher.none.fl_str_mv |
Univerzita Palackého v Olomouci |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134144646610944 |