Trustful Test Suites for Natural Language Processing
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.26334/2183-9077/rapln10ano2023a4 |
Resumo: | Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers. |
id |
RCAP_54d206fd8dc948b959c25f0016f41a7a |
---|---|
oai_identifier_str |
oai:ojs3.ojs.apl.pt:article/184 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Trustful Test Suites for Natural Language ProcessingCorpus de testes fiáveis para o processamento de linguagem naturalSistemas de Deteção Automática de ErrosAvaliação de desempenhoCorpus de testeAvaliação de Sistemas de PLNGrammar Error DetectionPerformance assessmentTest SuitesNLP systems evaluationMachine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.À medida que o estudo da Tradução Automática (TA) tem vindo a expandir-se ao longo do tempo, a necessidade de detetar e corrigir erros em textos tem também aumentado. Neste sentido, a Unbabel combina tradução automática com pós-edição feita por tradutores e linguistas, para, assim, obter traduções de boa qualidade. De modo a assistir os editores nas suas tarefas, foi desenvolvida uma ferramenta proprietária de deteção de erros denominada de Smartcheck, que identifica erros e sugere correções para os mesmos. O método mais recente de identificação de erros de tradução baseia-se em textos previamente pós-editados e anotados (categorizando cada erro de acordo com as suas características), que são fornecidos aos sistemas de tradução automática como sendo o padrão de avaliação ou o corpus de teste para avaliar a precisão dos sistemas de tradução. Contudo, é de extrema importância considerar que a eficácia dos corpora de teste pode ter um impacto significativo nos resultados das avaliações. De facto, se estes corpora não representarem de forma precisa e representativa o conteúdo, as decisões tomadas com base nas avaliações podem inadvertidamente produzir efeitos indesejados. Assim, é de extrema importância criar corpora de teste adequados, cujos dados sejam representativos das estruturas necessárias para cada sistema, incluindo ferramentas como o Smartcheck. Neste sentido, o presente trabalho permitiu criar e implementar uma nova metodologia de criação de corpus de teste bem fundamentada, que pode ser aplicada no processo de avaliação de sistemas de tradução automática e de ferramentas de deteção de erros. Recorrendo à aplicação deste corpus de avaliação, tornou-se possível confiar nas conclusões e ilações obtidas posteriormente. Esta metodologia possibilitou também que todo o processo de identificação de erros e avaliação de regras gramaticais se tornasse mais robusto, bem como o de deteção de problemas específicos por língua e/ou registo, permitindo, assim, adotar diversas medidas necessárias em produção. Por meio de sugestões de correção de erros válidas do Smartcheck e das melhorias aplicadas ao processo de pós-edição, o presente trabalho demonstrou ser possível aferir a qualidade das traduções que são entregues a diferentes clientes de forma mais cuidada e consistente.Associação Portuguesa de Linguística2023-10-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln10ano2023a4https://doi.org/10.26334/2183-9077/rapln10ano2023a4Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–792183-907710.26334/2183-9077/rapln10ano2023tdreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/184https://ojs.apl.pt/index.php/rapl/article/view/184/220Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Monizinfo:eu-repo/semantics/openAccessCabeça, MarianaBuchicchio, MariannaMoniz, Helena2023-12-09T10:16:25Zoai:ojs3.ojs.apl.pt:article/184Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:04.317691Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Trustful Test Suites for Natural Language Processing Corpus de testes fiáveis para o processamento de linguagem natural |
title |
Trustful Test Suites for Natural Language Processing |
spellingShingle |
Trustful Test Suites for Natural Language Processing Cabeça, Mariana Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation |
title_short |
Trustful Test Suites for Natural Language Processing |
title_full |
Trustful Test Suites for Natural Language Processing |
title_fullStr |
Trustful Test Suites for Natural Language Processing |
title_full_unstemmed |
Trustful Test Suites for Natural Language Processing |
title_sort |
Trustful Test Suites for Natural Language Processing |
author |
Cabeça, Mariana |
author_facet |
Cabeça, Mariana Buchicchio, Marianna Moniz, Helena |
author_role |
author |
author2 |
Buchicchio, Marianna Moniz, Helena |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Cabeça, Mariana Buchicchio, Marianna Moniz, Helena |
dc.subject.por.fl_str_mv |
Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation |
topic |
Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation |
description |
Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-10-22 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.26334/2183-9077/rapln10ano2023a4 https://doi.org/10.26334/2183-9077/rapln10ano2023a4 |
url |
https://doi.org/10.26334/2183-9077/rapln10ano2023a4 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://ojs.apl.pt/index.php/rapl/article/view/184 https://ojs.apl.pt/index.php/rapl/article/view/184/220 |
dc.rights.driver.fl_str_mv |
Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Associação Portuguesa de Linguística |
publisher.none.fl_str_mv |
Associação Portuguesa de Linguística |
dc.source.none.fl_str_mv |
Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79 Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–79 2183-9077 10.26334/2183-9077/rapln10ano2023td reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134142527438848 |