Detecting translingual plagiarism and the backlash against translation plagiarists

Detalhes bibliográficos
Autor(a) principal: Sousa-Silva, Rui
Data de Publicação: 2017
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://ojs.letras.up.pt/index.php/LLLD/article/view/2444
Resumo: Plagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is offuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of ‘translingual’ plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for ‘translingual’ plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of ‘linguistic uniqueness’ (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for ‘translingual’ plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed.
id RCAP_c60ed30f9331d92db2f04dcaf1fc3188
oai_identifier_str oai:ojs.pkp.sfu.ca:article/2444
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Detecting translingual plagiarism and the backlash against translation plagiaristsArtigos/ArticlesPlagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is offuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of ‘translingual’ plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for ‘translingual’ plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of ‘linguistic uniqueness’ (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for ‘translingual’ plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed.Faculdade de Letras da Universidade do Porto2017-05-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://ojs.letras.up.pt/index.php/LLLD/article/view/2444por2183-3745Sousa-Silva, Ruiinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-09-21T15:48:18Zoai:ojs.pkp.sfu.ca:article/2444Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:56:36.816980Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Detecting translingual plagiarism and the backlash against translation plagiarists
title Detecting translingual plagiarism and the backlash against translation plagiarists
spellingShingle Detecting translingual plagiarism and the backlash against translation plagiarists
Sousa-Silva, Rui
Artigos/Articles
title_short Detecting translingual plagiarism and the backlash against translation plagiarists
title_full Detecting translingual plagiarism and the backlash against translation plagiarists
title_fullStr Detecting translingual plagiarism and the backlash against translation plagiarists
title_full_unstemmed Detecting translingual plagiarism and the backlash against translation plagiarists
title_sort Detecting translingual plagiarism and the backlash against translation plagiarists
author Sousa-Silva, Rui
author_facet Sousa-Silva, Rui
author_role author
dc.contributor.author.fl_str_mv Sousa-Silva, Rui
dc.subject.por.fl_str_mv Artigos/Articles
topic Artigos/Articles
description Plagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is offuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of ‘translingual’ plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for ‘translingual’ plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of ‘linguistic uniqueness’ (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for ‘translingual’ plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed.
publishDate 2017
dc.date.none.fl_str_mv 2017-05-30T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://ojs.letras.up.pt/index.php/LLLD/article/view/2444
url https://ojs.letras.up.pt/index.php/LLLD/article/view/2444
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv 2183-3745
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Faculdade de Letras da Universidade do Porto
publisher.none.fl_str_mv Faculdade de Letras da Universidade do Porto
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799130434975563776