Trustful Test Suites for Natural Language Processing

Cabeça, Mariana; Buchicchio, Marianna; Moniz, Helena

Trustful Test Suites for Natural Language Processing

Detalhes bibliográficos
Autor(a) principal:	Cabeça, Mariana
Data de Publicação:	2023
Outros Autores:	Buchicchio, Marianna, Moniz, Helena
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://doi.org/10.26334/2183-9077/rapln10ano2023a4
Resumo:	Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.

Metadados do item

id	RCAP_54d206fd8dc948b959c25f0016f41a7a
oai_identifier_str	oai:ojs3.ojs.apl.pt:article/184
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Trustful Test Suites for Natural Language ProcessingCorpus de testes fiáveis para o processamento de linguagem naturalSistemas de Deteção Automática de ErrosAvaliação de desempenhoCorpus de testeAvaliação de Sistemas de PLNGrammar Error DetectionPerformance assessmentTest SuitesNLP systems evaluationMachine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.À medida que o estudo da Tradução Automática (TA) tem vindo a expandir-se ao longo do tempo, a necessidade de detetar e corrigir erros em textos tem também aumentado. Neste sentido, a Unbabel combina tradução automática com pós-edição feita por tradutores e linguistas, para, assim, obter traduções de boa qualidade. De modo a assistir os editores nas suas tarefas, foi desenvolvida uma ferramenta proprietária de deteção de erros denominada de Smartcheck, que identifica erros e sugere correções para os mesmos. O método mais recente de identificação de erros de tradução baseia-se em textos previamente pós-editados e anotados (categorizando cada erro de acordo com as suas características), que são fornecidos aos sistemas de tradução automática como sendo o padrão de avaliação ou o corpus de teste para avaliar a precisão dos sistemas de tradução. Contudo, é de extrema importância considerar que a eficácia dos corpora de teste pode ter um impacto significativo nos resultados das avaliações. De facto, se estes corpora não representarem de forma precisa e representativa o conteúdo, as decisões tomadas com base nas avaliações podem inadvertidamente produzir efeitos indesejados. Assim, é de extrema importância criar corpora de teste adequados, cujos dados sejam representativos das estruturas necessárias para cada sistema, incluindo ferramentas como o Smartcheck. Neste sentido, o presente trabalho permitiu criar e implementar uma nova metodologia de criação de corpus de teste bem fundamentada, que pode ser aplicada no processo de avaliação de sistemas de tradução automática e de ferramentas de deteção de erros. Recorrendo à aplicação deste corpus de avaliação, tornou-se possível confiar nas conclusões e ilações obtidas posteriormente. Esta metodologia possibilitou também que todo o processo de identificação de erros e avaliação de regras gramaticais se tornasse mais robusto, bem como o de deteção de problemas específicos por língua e/ou registo, permitindo, assim, adotar diversas medidas necessárias em produção. Por meio de sugestões de correção de erros válidas do Smartcheck e das melhorias aplicadas ao processo de pós-edição, o presente trabalho demonstrou ser possível aferir a qualidade das traduções que são entregues a diferentes clientes de forma mais cuidada e consistente.Associação Portuguesa de Linguística2023-10-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln10ano2023a4https://doi.org/10.26334/2183-9077/rapln10ano2023a4Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–792183-907710.26334/2183-9077/rapln10ano2023tdreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/184https://ojs.apl.pt/index.php/rapl/article/view/184/220Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Monizinfo:eu-repo/semantics/openAccessCabeça, MarianaBuchicchio, MariannaMoniz, Helena2023-12-09T10:16:25Zoai:ojs3.ojs.apl.pt:article/184Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:26:04.317691Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Trustful Test Suites for Natural Language Processing Corpus de testes fiáveis para o processamento de linguagem natural
title	Trustful Test Suites for Natural Language Processing
spellingShingle	Trustful Test Suites for Natural Language Processing Cabeça, Mariana Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation
title_short	Trustful Test Suites for Natural Language Processing
title_full	Trustful Test Suites for Natural Language Processing
title_fullStr	Trustful Test Suites for Natural Language Processing
title_full_unstemmed	Trustful Test Suites for Natural Language Processing
title_sort	Trustful Test Suites for Natural Language Processing
author	Cabeça, Mariana
author_facet	Cabeça, Mariana Buchicchio, Marianna Moniz, Helena
author_role	author
author2	Buchicchio, Marianna Moniz, Helena
author2_role	author author
dc.contributor.author.fl_str_mv	Cabeça, Mariana Buchicchio, Marianna Moniz, Helena
dc.subject.por.fl_str_mv	Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation
topic	Sistemas de Deteção Automática de Erros Avaliação de desempenho Corpus de teste Avaliação de Sistemas de PLN Grammar Error Detection Performance assessment Test Suites NLP systems evaluation
description	Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error detection tool named Smartcheck, designed to identify errors and provide correction suggestions. Traditionally, the evaluation of translation errors relies on carefully curated annotated texts, categorized based on error types, which serve as the evaluation standard or Test Suites for assessing the accuracy of machine translation systems. However, it is crucial to consider that the effectiveness of evaluation sets can significantly impact the outcomes of evaluations. In fact, if evaluation sets do not accurately represent the content or possess inherent flaws, the decisions made based on such evaluations may inadvertently yield undesired effects. Hence, it is of utmost importance to employ suitable datasets containing representative data of the structures needed for each system, including Smartcheck. In this paper we present the methodology that has been developed and implemented to create reliable and revised Test Suites specifically designed for the evaluation process of MT systems and error detection tools. By using these meticulously curated Test Suites to evaluate proprietary systems and tools, we can ensure the trustworthiness of the conclusions and decisions derived from the evaluations. This methodology accomplished robust identification of problematic error types, grammar-checking rules, and language- and/or register-specific issues, leading to the adoption of effective production measures. With the integration of Smartcheck’s reliable and accurate correction suggestions and the improvements made to the post-edition revision process, the work presented herein led to a noticeable improvement in the translation quality delivered to customers.
publishDate	2023
dc.date.none.fl_str_mv	2023-10-22
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.26334/2183-9077/rapln10ano2023a4 https://doi.org/10.26334/2183-9077/rapln10ano2023a4
url	https://doi.org/10.26334/2183-9077/rapln10ano2023a4
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://ojs.apl.pt/index.php/rapl/article/view/184 https://ojs.apl.pt/index.php/rapl/article/view/184/220
dc.rights.driver.fl_str_mv	Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Direitos de Autor (c) 2023 Marianna Buchicchio, Mariana Cabeça, Helena Moniz
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Associação Portuguesa de Linguística
publisher.none.fl_str_mv	Associação Portuguesa de Linguística
dc.source.none.fl_str_mv	Revista da Associação Portuguesa de Linguística; No. 10 (2023): Journal of the Portuguese Linguistics Association; 58–79 Revista da Associação Portuguesa de Linguística; N.º 10 (2023): Revista da Associação Portuguesa de Linguística; 58–79 2183-9077 10.26334/2183-9077/rapln10ano2023td reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134142527438848

Trustful Test Suites for Natural Language Processing

Registros relacionados