Trustful Test Suites for Natural Language Processing

Detalhes bibliográficos
Autor(a) principal: Cabeça, Mariana
Data de Publicação: 2023
Outros Autores: Buchicchio, Marianna, Moniz, Helena
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.26334/2183-9077/rapln10ano2023a4
Resumo: Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.
id RCAP_54d206fd8dc948b959c25f0016f41a7a
oai_identifier_str oai:ojs3.ojs.apl.pt:article/184
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Trustful Test Suites for Natural Language ProcessingCorpus de testes fiáveis para o processamento de linguagem naturalSistemas de Deteção Automática de ErrosAvaliação de desempenhoCorpus de testeAvaliação de Sistemas de PLNGrammar Error DetectionPerformance assessmentTest SuitesNLP systems evaluationMachine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.À medida que o estudo da Tradução Automática (TA) tem vindo a expandir-se ao longo do tempo, a necessidade de detetar e corrigir erros em textos tem também aumentado. Neste sentido, a Unbabel combina tradução automática com pós-edição feita por tradutores e linguistas, para, assim, obter traduções de boa qualidade. De modo a assistir os editores nas suas tarefas, foi desenvolvida uma ferramenta proprietária de deteção de erros denominada de Smartcheck, que identifica erros e sugere correções para os mesmos. O método mais recente de identificação de erros de tradução baseia-se em textos previamente pós-editados e anotados (categorizando cada erro de acordo com as suas características), que são fornecidos aos sistemas de tradução automática como sendo o padrão de avaliação ou o corpus de teste para avaliar a precisão dos sistemas de tradução. Contudo, é de extrema importância considerar que a eficácia dos corpora de teste pode ter um impacto significativo nos resultados das avaliações. De facto, se estes corpora não representarem de forma precisa e representativa o conteúdo, as decisões tomadas com base nas avaliações podem inadvertidamente produzir efeitos indesejados. Assim, é de extrema importância criar corpora de teste adequados, cujos dados sejam representativos das estruturas necessárias para cada sistema, incluindo ferramentas como o Smartcheck. Neste sentido, o presente trabalho permitiu criar e implementar uma nova metodologia de criação de corpus de teste bem fundamentada, que pode ser aplicada no processo de avaliação de sistemas de tradução automática e de ferramentas de deteção de erros. Recorrendo à aplicação deste corpus de avaliação, tornou-se possível confiar nas conclusões e ilações obtidas posteriormente. Esta metodologia possibilitou também que todo o processo de identificação de erros e avaliação de regras gramaticais se tornasse mais robusto, bem como o de deteção de problemas específicos por língua e/ou registo, permitindo, assim, adotar diversas medidas necessárias em produção. Por meio de sugestões de correção de erros válidas do Smartcheck e das melhorias aplicadas ao processo de pós-edição, o presente trabalho demonstrou ser possível aferir a qualidade das traduções que são entregues a diferentes clientes de forma mais cuidada e consistente.Associação Portuguesa de Linguística2023-10-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln10ano2023a4https://doi.org/10.26334/2183-9077/rapln10ano2023a4Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–792183-907710.26334/2183-9077/rapln10ano2023tdreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/184https://ojs.apl.pt/index.php/rapl/article/view/184/220Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Monizinfo:eu-repo/semantics/openAccessCabeça, MarianaBuchicchio, MariannaMoniz, Helena2023-12-09T10:16:25Zoai:ojs3.ojs.apl.pt:article/184Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:04.317691Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Trustful Test Suites for Natural Language Processing
Corpus de testes fiáveis para o processamento de linguagem natural
title Trustful Test Suites for Natural Language Processing
spellingShingle Trustful Test Suites for Natural Language Processing
Cabeça, Mariana
Sistemas de Deteção Automática de Erros
Avaliação de desempenho
Corpus de teste
Avaliação de Sistemas de PLN
Grammar Error Detection
Performance assessment
Test Suites
NLP systems evaluation
title_short Trustful Test Suites for Natural Language Processing
title_full Trustful Test Suites for Natural Language Processing
title_fullStr Trustful Test Suites for Natural Language Processing
title_full_unstemmed Trustful Test Suites for Natural Language Processing
title_sort Trustful Test Suites for Natural Language Processing
author Cabeça, Mariana
author_facet Cabeça, Mariana
Buchicchio, Marianna
Moniz, Helena
author_role author
author2 Buchicchio, Marianna
Moniz, Helena
author2_role author
author
dc.contributor.author.fl_str_mv Cabeça, Mariana
Buchicchio, Marianna
Moniz, Helena
dc.subject.por.fl_str_mv Sistemas de Deteção Automática de Erros
Avaliação de desempenho
Corpus de teste
Avaliação de Sistemas de PLN
Grammar Error Detection
Performance assessment
Test Suites
NLP systems evaluation
topic Sistemas de Deteção Automática de Erros
Avaliação de desempenho
Corpus de teste
Avaliação de Sistemas de PLN
Grammar Error Detection
Performance assessment
Test Suites
NLP systems evaluation
description Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.
publishDate 2023
dc.date.none.fl_str_mv 2023-10-22
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.26334/2183-9077/rapln10ano2023a4
https://doi.org/10.26334/2183-9077/rapln10ano2023a4
url https://doi.org/10.26334/2183-9077/rapln10ano2023a4
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://ojs.apl.pt/index.php/rapl/article/view/184
https://ojs.apl.pt/index.php/rapl/article/view/184/220
dc.rights.driver.fl_str_mv Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Associação Portuguesa de Linguística
publisher.none.fl_str_mv Associação Portuguesa de Linguística
dc.source.none.fl_str_mv Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79
Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–79
2183-9077
10.26334/2183-9077/rapln10ano2023td
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134142527438848