Using noise to detect test flakiness

SILVA, Denini Gabriel

Using noise to detect test flakiness

Detalhes bibliográficos
Autor(a) principal:	SILVA, Denini Gabriel
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da UFPE
dARK ID:	ark:/64986/001300000q102
Texto Completo:	https://repositorio.ufpe.br/handle/123456789/44567
Resumo:	A test is said to be flaky when it non-deterministically passes or fails in different runs on the same configuration (e.g., code). Test flakiness negatively affects regression testing as failure observations are not necessarily an indication of bugs in the program. Static and dynamic techniques for detecting flaky tests have been proposed in the literature but they are limited. Prior studies have shown that test flakiness is mostly caused by concurrent behavior. Based on that observation, we hypothesize that adding noise in the environment (stress tests consuming machine resources such as CPU and memory) can interfere in the ordering of program events and, consequently, it can influence the test outputs. We propose Shaker, a practical technique to detect flaky tests by comparing the outputs of multiple test runs in noisy environments. Compared with a regular test run, one test run with Shaker is slower as the environment is loaded, i.e., the process that runs a given test competes for resources with stressor tasks that Shaker creates. However, we conjecture that Shaker pays off by detecting flakiness in fewer runs compared with the alternative of running the test suite multiple times in a regular (non-noisy) environment. We evaluated Shaker using a public benchmark of flaky tests, obtaining encouraging results. For example, we found that (1) Shaker is 96% precise; it is almost as precise as ReRun, which by definition does not report false positives, that (2) Shaker’s recall is much higher compared to ReRun’s (95% versus 65%), and that (3) Shaker detects flaky tests much more efficiently than ReRun, despite the execution overhead associated with noise introduction. To sum up, results indicate that noise is a promising approach to detect flakiness.

Metadados do item

id	UFPE_1bfe6804f1a5f71fffe719e7ad046698
oai_identifier_str	oai:repositorio.ufpe.br:123456789/44567
network_acronym_str	UFPE
network_name_str	Repositório Institucional da UFPE
repository_id_str	2221
spelling	SILVA, Denini Gabrielhttp://lattes.cnpq.br/2453726460754742http://lattes.cnpq.br/3762670242328435http://lattes.cnpq.br/0311224988123909D'AMORIM, Marcelo BezerraMIRANDA, Breno Alexandro Ferreira de2022-05-25T16:56:01Z2022-05-25T16:56:01Z2022-02-25SILVA, Denini Gabriel. Using noise to detect test flakiness. 2022. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022.https://repositorio.ufpe.br/handle/123456789/44567ark:/64986/001300000q102A test is said to be flaky when it non-deterministically passes or fails in different runs on the same configuration (e.g., code). Test flakiness negatively affects regression testing as failure observations are not necessarily an indication of bugs in the program. Static and dynamic techniques for detecting flaky tests have been proposed in the literature but they are limited. Prior studies have shown that test flakiness is mostly caused by concurrent behavior. Based on that observation, we hypothesize that adding noise in the environment (stress tests consuming machine resources such as CPU and memory) can interfere in the ordering of program events and, consequently, it can influence the test outputs. We propose Shaker, a practical technique to detect flaky tests by comparing the outputs of multiple test runs in noisy environments. Compared with a regular test run, one test run with Shaker is slower as the environment is loaded, i.e., the process that runs a given test competes for resources with stressor tasks that Shaker creates. However, we conjecture that Shaker pays off by detecting flakiness in fewer runs compared with the alternative of running the test suite multiple times in a regular (non-noisy) environment. We evaluated Shaker using a public benchmark of flaky tests, obtaining encouraging results. For example, we found that (1) Shaker is 96% precise; it is almost as precise as ReRun, which by definition does not report false positives, that (2) Shaker’s recall is much higher compared to ReRun’s (95% versus 65%), and that (3) Shaker detects flaky tests much more efficiently than ReRun, despite the execution overhead associated with noise introduction. To sum up, results indicate that noise is a promising approach to detect flakiness.FACEPEUm teste é dito como “flaky” quando passa ou falha de forma não determinística em diferentes execuções na mesma configuração (por exemplo, código). o teste flaky afeta neg- ativamente o teste de regressão, pois as observações de falha não são necessariamente uma indicação de bugs no programa. Técnicas estáticas e dinâmicas para detecção de testes flaky têm sido propostas na literatura, mas são limitadas. Estudos anteriores mostraram que testes flaky são causados principalmente por comportamentos de concorrência. Com base nessa observação, levantamos a hipótese de que a adição de ruído no ambiente (testes de estresse consumindo recursos da máquina, como CPU e memória) pode interferir na ordenação dos eventos do programa e, consequentemente, pode influenciar as saídas do teste. Propomos Shaker, uma técnica prática para detectar testes flaky comparando as saídas de várias execuções de teste em ambientes ruidosos. Em comparação com uma execução de teste normal, uma execução de teste com Shaker é mais lenta à medida que o ambiente é carregado, ou seja, o processo que executa um determinado teste com- pete por recursos com taks de estressores que Shaker cria. No entanto, conjecturamos que Shaker compensa ao detectar falhas em menos execuções em comparação com a alternativa de executar o conjunto de testes várias vezes em um ambiente normal (sem ruído). Avaliamos Shaker usando um benchmark público de testes flaky, obtendo resul- tados encorajadores. Por exemplo, descobrimos que (1) Shaker é 96% preciso; équase tão preciso quanto ReRun, que por definição não reporta falsos positivos, (2) O recall de Shaker é muito maior comparado com ReRun (95% versus .65%), e que (3) Shaker detecta testes flaky com muito mais eficiência do que ReRun, apesar da sobrecarga de execução associada à introdução de ruído. Em suma, os resultados indicam que o ruído é uma abordagem promissora para detectar testes flaky.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessEngenharia de software e linguagens de programaçãoAndroidTeste de softwareDepuraçãoEvolução de softwareUsing noise to detect test flakinessinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPECC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/44567/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82142https://repositorio.ufpe.br/bitstream/123456789/44567/3/license.txt6928b9260b07fb2755249a5ca9903395MD53ORIGINALDISSERTAÇÃO Denini Gabriel Silva.pdfDISSERTAÇÃO Denini Gabriel Silva.pdfapplication/pdf1119239https://repositorio.ufpe.br/bitstream/123456789/44567/1/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf56cccd8e4e8ae63814b5c8ba3c80ef58MD51TEXTDISSERTAÇÃO Denini Gabriel Silva.pdf.txtDISSERTAÇÃO Denini Gabriel Silva.pdf.txtExtracted texttext/plain133150https://repositorio.ufpe.br/bitstream/123456789/44567/4/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf.txt9aa2b6448ea40d88819d855191096227MD54THUMBNAILDISSERTAÇÃO Denini Gabriel Silva.pdf.jpgDISSERTAÇÃO Denini Gabriel Silva.pdf.jpgGenerated Thumbnailimage/jpeg1201https://repositorio.ufpe.br/bitstream/123456789/44567/5/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf.jpg050212295cdbcc375119a5aa13dda18eMD55123456789/445672022-05-26 02:28:21.953oai:repositorio.ufpe.br:123456789/44567VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBkZSBEb2N1bWVudG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUKIAoKRGVjbGFybyBlc3RhciBjaWVudGUgZGUgcXVlIGVzdGUgVGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyB0ZW0gbyBvYmpldGl2byBkZSBkaXZ1bGdhw6fDo28gZG9zIGRvY3VtZW50b3MgZGVwb3NpdGFkb3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBlIGRlY2xhcm8gcXVlOgoKSSAtICBvIGNvbnRlw7pkbyBkaXNwb25pYmlsaXphZG8gw6kgZGUgcmVzcG9uc2FiaWxpZGFkZSBkZSBzdWEgYXV0b3JpYTsKCklJIC0gbyBjb250ZcO6ZG8gw6kgb3JpZ2luYWwsIGUgc2UgbyB0cmFiYWxobyBlL291IHBhbGF2cmFzIGRlIG91dHJhcyBwZXNzb2FzIGZvcmFtIHV0aWxpemFkb3MsIGVzdGFzIGZvcmFtIGRldmlkYW1lbnRlIHJlY29uaGVjaWRhczsKCklJSSAtIHF1YW5kbyB0cmF0YXItc2UgZGUgVHJhYmFsaG8gZGUgQ29uY2x1c8OjbyBkZSBDdXJzbywgRGlzc2VydGHDp8OjbyBvdSBUZXNlOiBvIGFycXVpdm8gZGVwb3NpdGFkbyBjb3JyZXNwb25kZSDDoCB2ZXJzw6NvIGZpbmFsIGRvIHRyYWJhbGhvOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogZXN0b3UgY2llbnRlIGRlIHF1ZSBhIGFsdGVyYcOnw6NvIGRhIG1vZGFsaWRhZGUgZGUgYWNlc3NvIGFvIGRvY3VtZW50byBhcMOzcyBvIGRlcMOzc2l0byBlIGFudGVzIGRlIGZpbmRhciBvIHBlcsOtb2RvIGRlIGVtYmFyZ28sIHF1YW5kbyBmb3IgZXNjb2xoaWRvIGFjZXNzbyByZXN0cml0bywgc2Vyw6EgcGVybWl0aWRhIG1lZGlhbnRlIHNvbGljaXRhw6fDo28gZG8gKGEpIGF1dG9yIChhKSBhbyBTaXN0ZW1hIEludGVncmFkbyBkZSBCaWJsaW90ZWNhcyBkYSBVRlBFIChTSUIvVUZQRSkuCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBBYmVydG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBhcnQuIDI5LCBpbmNpc28gSUlJLCBhdXRvcml6byBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFBlcm5hbWJ1Y28gYSBkaXNwb25pYmlsaXphciBncmF0dWl0YW1lbnRlLCBzZW0gcmVzc2FyY2ltZW50byBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHBhcmEgZmlucyBkZSBsZWl0dXJhLCBpbXByZXNzw6NvIGUvb3UgZG93bmxvYWQgKGFxdWlzacOnw6NvKSBhdHJhdsOpcyBkbyBzaXRlIGRvIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgbm8gZW5kZXJlw6dvIGh0dHA6Ly93d3cucmVwb3NpdG9yaW8udWZwZS5iciwgYSBwYXJ0aXIgZGEgZGF0YSBkZSBkZXDDs3NpdG8uCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBSZXN0cml0bzoKCk5hIHF1YWxpZGFkZSBkZSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkZSBhdXRvciBxdWUgcmVjYWVtIHNvYnJlIGVzdGUgZG9jdW1lbnRvLCBmdW5kYW1lbnRhZG8gbmEgTGVpIGRlIERpcmVpdG8gQXV0b3JhbCBubyA5LjYxMCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIHF1YW5kbyBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvIGNvbmRpemVudGUgYW8gdGlwbyBkZSBkb2N1bWVudG8sIGNvbmZvcm1lIGluZGljYWRvIG5vIGNhbXBvIERhdGEgZGUgRW1iYXJnby4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212022-05-26T05:28:21Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv	Using noise to detect test flakiness
title	Using noise to detect test flakiness
spellingShingle	Using noise to detect test flakiness SILVA, Denini Gabriel Engenharia de software e linguagens de programação Android Teste de software Depuração Evolução de software
title_short	Using noise to detect test flakiness
title_full	Using noise to detect test flakiness
title_fullStr	Using noise to detect test flakiness
title_full_unstemmed	Using noise to detect test flakiness
title_sort	Using noise to detect test flakiness
author	SILVA, Denini Gabriel
author_facet	SILVA, Denini Gabriel
author_role	author
dc.contributor.authorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/2453726460754742
dc.contributor.advisorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/3762670242328435
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/0311224988123909
dc.contributor.author.fl_str_mv	SILVA, Denini Gabriel
dc.contributor.advisor1.fl_str_mv	D'AMORIM, Marcelo Bezerra
dc.contributor.advisor-co1.fl_str_mv	MIRANDA, Breno Alexandro Ferreira de
contributor_str_mv	D'AMORIM, Marcelo Bezerra MIRANDA, Breno Alexandro Ferreira de
dc.subject.por.fl_str_mv	Engenharia de software e linguagens de programação Android Teste de software Depuração Evolução de software
topic	Engenharia de software e linguagens de programação Android Teste de software Depuração Evolução de software
description	A test is said to be flaky when it non-deterministically passes or fails in different runs on the same configuration (e.g., code). Test flakiness negatively affects regression testing as failure observations are not necessarily an indication of bugs in the program. Static and dynamic techniques for detecting flaky tests have been proposed in the literature but they are limited. Prior studies have shown that test flakiness is mostly caused by concurrent behavior. Based on that observation, we hypothesize that adding noise in the environment (stress tests consuming machine resources such as CPU and memory) can interfere in the ordering of program events and, consequently, it can influence the test outputs. We propose Shaker, a practical technique to detect flaky tests by comparing the outputs of multiple test runs in noisy environments. Compared with a regular test run, one test run with Shaker is slower as the environment is loaded, i.e., the process that runs a given test competes for resources with stressor tasks that Shaker creates. However, we conjecture that Shaker pays off by detecting flakiness in fewer runs compared with the alternative of running the test suite multiple times in a regular (non-noisy) environment. We evaluated Shaker using a public benchmark of flaky tests, obtaining encouraging results. For example, we found that (1) Shaker is 96% precise; it is almost as precise as ReRun, which by definition does not report false positives, that (2) Shaker’s recall is much higher compared to ReRun’s (95% versus 65%), and that (3) Shaker detects flaky tests much more efficiently than ReRun, despite the execution overhead associated with noise introduction. To sum up, results indicate that noise is a promising approach to detect flakiness.
publishDate	2022
dc.date.accessioned.fl_str_mv	2022-05-25T16:56:01Z
dc.date.available.fl_str_mv	2022-05-25T16:56:01Z
dc.date.issued.fl_str_mv	2022-02-25
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	SILVA, Denini Gabriel. Using noise to detect test flakiness. 2022. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022.
dc.identifier.uri.fl_str_mv	https://repositorio.ufpe.br/handle/123456789/44567
dc.identifier.dark.fl_str_mv	ark:/64986/001300000q102
identifier_str_mv	SILVA, Denini Gabriel. Using noise to detect test flakiness. 2022. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2022. ark:/64986/001300000q102
url	https://repositorio.ufpe.br/handle/123456789/44567
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv	Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv	UFPE
dc.publisher.country.fl_str_mv	Brasil
publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE
instname_str	Universidade Federal de Pernambuco (UFPE)
instacron_str	UFPE
institution	UFPE
reponame_str	Repositório Institucional da UFPE
collection	Repositório Institucional da UFPE
bitstream.url.fl_str_mv	https://repositorio.ufpe.br/bitstream/123456789/44567/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/44567/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/44567/1/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf https://repositorio.ufpe.br/bitstream/123456789/44567/4/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/44567/5/DISSERTA%c3%87%c3%83O%20Denini%20Gabriel%20Silva.pdf.jpg
bitstream.checksum.fl_str_mv	e39d27027a6cc9cb039ad269a5db8e34 6928b9260b07fb2755249a5ca9903395 56cccd8e4e8ae63814b5c8ba3c80ef58 9aa2b6448ea40d88819d855191096227 050212295cdbcc375119a5aa13dda18e
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv	attena@ufpe.br
_version_	1815172881525506048

Using noise to detect test flakiness

Registros relacionados