Detecting translingual plagiarism and the backlash against translation plagiarists
Autor(a) principal: | |
---|---|
Data de Publicação: | 2014 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/82748 |
Resumo: | Plagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is obfuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of translingual plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for translingual plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of linguistic uniqueness (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for translingual plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed. |
id |
RCAP_473fe6c42ed085a6c3bf64923f184172 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/82748 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Detecting translingual plagiarism and the backlash against translation plagiaristsPlagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is obfuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of translingual plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for translingual plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of linguistic uniqueness (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for translingual plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed.20142014-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/82748engSousa-Silva, Ruiinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:43:35Zoai:repositorio-aberto.up.pt:10216/82748Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:25:31.835727Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Detecting translingual plagiarism and the backlash against translation plagiarists |
title |
Detecting translingual plagiarism and the backlash against translation plagiarists |
spellingShingle |
Detecting translingual plagiarism and the backlash against translation plagiarists Sousa-Silva, Rui |
title_short |
Detecting translingual plagiarism and the backlash against translation plagiarists |
title_full |
Detecting translingual plagiarism and the backlash against translation plagiarists |
title_fullStr |
Detecting translingual plagiarism and the backlash against translation plagiarists |
title_full_unstemmed |
Detecting translingual plagiarism and the backlash against translation plagiarists |
title_sort |
Detecting translingual plagiarism and the backlash against translation plagiarists |
author |
Sousa-Silva, Rui |
author_facet |
Sousa-Silva, Rui |
author_role |
author |
dc.contributor.author.fl_str_mv |
Sousa-Silva, Rui |
description |
Plagiarism detection methods have improved signiVcantly over the last decades, and as a result of the advanced research conducted by computational and mostly forensic linguists, simple and sophisticated textual borrowing strategies can now be identiVed more easily. In particular, simple text comparison algorithms developed by computational linguists allow literal, word-for-word plagiarism (i.e. where identical strings of text are reused across diUerent documents) to be easily detected (semi-)automatically (e.g. Turnitin or SafeAssign), although these methods tend to perform less well when the borrowing is obfuscated by introducing edits to the original text. In this case, more sophisticated linguistic techniques, such as an analysis of lexical overlap (Johnson, 1997), are required to detect the borrowing. However, these have limited applicability in cases of translingual plagiarism, where a text is translated and borrowed without acknowledgment from an original in another language. Considering that (a) traditionally non-professional translation (e.g. literal or free machine translation) is the method used to plagiarise; (b) the plagiarist usually edits the text for grammar and syntax, especially when machine-translated; and (c) lexical items are those that tend to be translated more correctly, and carried over to the derivative text, this paper proposes a method for translingual plagiarism detection that is grounded on translation and interlanguage theories (Selinker, 1972; Bassnett and Lefevere, 1998), as well as on the principle of linguistic uniqueness (Coulthard, 2004). Empirical evidence from the CorRUPT corpus (Corpus of Reused and Plagiarised Texts), a corpus of real academic and non-academic texts that were investigated and accused of plagiarising originals in other languages, is used to illustrate the applicability of the methodology proposed for translingual plagiarism detection. Finally, applications of the method as an investigative tool in forensic contexts are discussed. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014 2014-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/82748 |
url |
https://hdl.handle.net/10216/82748 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135560192753664 |