On the use of control- and data-ow in fault localization

Ribeiro, Henrique Lemos

On the use of control- and data-ow in fault localization

Detalhes bibliográficos
Autor(a) principal:	Ribeiro, Henrique Lemos
Data de Publicação:	2016
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	http://www.teses.usp.br/teses/disponiveis/100/100131/tde-18102016-092654/
Resumo:	Testing and debugging are key tasks during the development cycle. However, they are among the most expensive activities during the development process. To improve the productivity of developers during the debugging process various fault localization techniques have been proposed, being Spectrum-based Fault Localization (SFL), or Coverage-based Fault Localization (CBFL), one of the most promising. SFL techniques pinpoints program elements (e.g., statements, branches, and definition-use associations), sorting them by their suspiciousness. Heuristics are used to rank the most suspicious program elements which are then mapped into lines to be inspected by developers. Although data-flow spectra (definition-use associations) has been shown to perform better than control-flow spectra (statements and branches) to locate the bug site, the high overhead to collect data-flow spectra has prevented their use on industry-level code. A data-flow coverage tool was recently implemented presenting on average 38% run-time overhead for large programs. Such a fairly modest overhead motivates the study of SFL techniques using data-flow information in programs similar to those developed in the industry. To achieve such a goal, we implemented Jaguar (JAva coveraGe faUlt locAlization Ranking), a tool that employ control-flow and data-flow coverage on SFL techniques. The effectiveness and efficiency of both coverages are compared using 173 faulty versions with sizes varying from 10 to 96 KLOC. Ten known SFL heuristics to rank the most suspicious lines are utilized. The results show that the behavior of the heuristics are similar both to control- and data-flow coverage: Kulczynski2 and Mccon perform better for small number of lines investigated (from 5 to 30 lines) while Ochiai performs better when more lines are inspected (30 to 100 lines). The comparison between control- and data-flow coverages shows that data-flow locates more defects in the range of 10 to 50 inspected lines, being up to 22% more effective. Moreover, in the range of 20 and 100 lines, data-flow ranks the bug better than control-flow with statistical significance. However, data-flow is still more expensive than control-flow: it takes from 23% to 245% longer to obtain the most suspicious lines; on average data-flow is 129% more costly. Therefore, our results suggest that data-flow is more effective in locating faults because it tracks more relationships during the program execution. On the other hand, SFL techniques supported by data-flow coverage needs to be improved for practical use at industrial settings

Metadados do item

id	USP_61c3889a7af827ad05da13ec94c20beb
oai_identifier_str	oai:teses.usp.br:tde-18102016-092654
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	On the use of control- and data-ow in fault localizationSobre o uso de fluxo de controle e de dados para a localizao de defeitosControl-flowData-flowEngenharia de softwareFault localizationFluxo de controleFluxo de dadosLocalização de defeitosSoftware engineeringTesting and debugging are key tasks during the development cycle. However, they are among the most expensive activities during the development process. To improve the productivity of developers during the debugging process various fault localization techniques have been proposed, being Spectrum-based Fault Localization (SFL), or Coverage-based Fault Localization (CBFL), one of the most promising. SFL techniques pinpoints program elements (e.g., statements, branches, and definition-use associations), sorting them by their suspiciousness. Heuristics are used to rank the most suspicious program elements which are then mapped into lines to be inspected by developers. Although data-flow spectra (definition-use associations) has been shown to perform better than control-flow spectra (statements and branches) to locate the bug site, the high overhead to collect data-flow spectra has prevented their use on industry-level code. A data-flow coverage tool was recently implemented presenting on average 38% run-time overhead for large programs. Such a fairly modest overhead motivates the study of SFL techniques using data-flow information in programs similar to those developed in the industry. To achieve such a goal, we implemented Jaguar (JAva coveraGe faUlt locAlization Ranking), a tool that employ control-flow and data-flow coverage on SFL techniques. The effectiveness and efficiency of both coverages are compared using 173 faulty versions with sizes varying from 10 to 96 KLOC. Ten known SFL heuristics to rank the most suspicious lines are utilized. The results show that the behavior of the heuristics are similar both to control- and data-flow coverage: Kulczynski2 and Mccon perform better for small number of lines investigated (from 5 to 30 lines) while Ochiai performs better when more lines are inspected (30 to 100 lines). The comparison between control- and data-flow coverages shows that data-flow locates more defects in the range of 10 to 50 inspected lines, being up to 22% more effective. Moreover, in the range of 20 and 100 lines, data-flow ranks the bug better than control-flow with statistical significance. However, data-flow is still more expensive than control-flow: it takes from 23% to 245% longer to obtain the most suspicious lines; on average data-flow is 129% more costly. Therefore, our results suggest that data-flow is more effective in locating faults because it tracks more relationships during the program execution. On the other hand, SFL techniques supported by data-flow coverage needs to be improved for practical use at industrial settingsTeste e depuração são tarefas importantes durante o ciclo de desenvolvimento de programas, no entanto, estão entre as atividades mais caras do processo de desenvolvimento. Diversas técnicas de localização de defeitos têm sido propostas a fim de melhorar a produtividade dos desenvolvedores durante o processo de depuração, sendo a localização de defeitos baseados em cobertura de código (Spectrum-based Fault Localization (SFL) uma das mais promissoras. A técnica SFL aponta os elementos de programas (e.g., comandos, ramos e associações definição-uso), ordenando-os por valor de suspeição. Heursticas são usadas para ordenar os elementos mais suspeitos de um programa, que então são mapeados em linhas de código a serem inspecionadas pelos desenvolvedores. Embora informações de fluxo de dados (associações definição-uso) tenham mostrado desempenho melhor do que informações de fluxo de controle (comandos e ramos) para localizar defeitos, o alto custo para coletar cobertura de fluxo de dados tem impedido a sua utilização na prática. Uma ferramenta de cobertura de fluxo de dados foi recentemente implementada apresentando, em média, 38% de sobrecarga em tempo de execução em programas similares aos desenvolvidos na indústria. Tal sobrecarga, bastante modesta, motiva o estudo de SFL usando informações de fluxo de dados. Para atingir esse objetivo, Jaguar (Java coveraGe faUlt locAlization Ranking), uma ferramenta que usa técnicas SFL com cobertura de fluxo de controle e de dados, foi implementada. A eficiência e eficácia de ambos os tipos de coberturas foram comparados usando 173 versões com defeitos de programas com tamanhos variando de 10 a 96 KLOC. Foram usadas dez heursticas conhecidas para ordenar as linhas mais suspeitas. Os resultados mostram que o comportamento das heursticas são similares para fluxo de controle e de dados: Kulczyski2 e Mccon obtêm melhores resultados para números menores de linhas investigadas (de 5 a 30), enquanto Ochiai é melhor quando mais linhas são inspecionadas (de 30 a 100). A comparação entre os dois tipos de cobertura mostra que fluxo de dados localiza mais defeitos em uma variação de 10 a 50 linhas inspecionadas, sendo até 22% mais eficaz. Além disso, na faixa entre 20 e 100 linhas, fluxo de dados classifica com significância estatstica melhor os defeitos. No entanto, fluxo de dados é mais caro do que fluxo de controle: leva de 23% a 245% mais tempo para obter os resultados; fluxo de dados é em média 129% mais custoso. Portanto, os resultados indicam que fluxo de dados é mais eficaz para localizar os defeitos pois rastreia mais relacionamentos durante a execução do programa. Por outro lado, técnicas SFL apoiadas por cobertura de fluxo de dados precisam ser mais eficientes para utilização prática na indústriaBiblioteca Digitais de Teses e Dissertações da USPChaim, Marcos LordelloRibeiro, Henrique Lemos2016-08-19info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/100/100131/tde-18102016-092654/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T13:16:04Zoai:teses.usp.br:tde-18102016-092654Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-10-09T13:16:04Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	On the use of control- and data-ow in fault localization Sobre o uso de fluxo de controle e de dados para a localizao de defeitos
title	On the use of control- and data-ow in fault localization
spellingShingle	On the use of control- and data-ow in fault localization Ribeiro, Henrique Lemos Control-flow Data-flow Engenharia de software Fault localization Fluxo de controle Fluxo de dados Localização de defeitos Software engineering
title_short	On the use of control- and data-ow in fault localization
title_full	On the use of control- and data-ow in fault localization
title_fullStr	On the use of control- and data-ow in fault localization
title_full_unstemmed	On the use of control- and data-ow in fault localization
title_sort	On the use of control- and data-ow in fault localization
author	Ribeiro, Henrique Lemos
author_facet	Ribeiro, Henrique Lemos
author_role	author
dc.contributor.none.fl_str_mv	Chaim, Marcos Lordello
dc.contributor.author.fl_str_mv	Ribeiro, Henrique Lemos
dc.subject.por.fl_str_mv	Control-flow Data-flow Engenharia de software Fault localization Fluxo de controle Fluxo de dados Localização de defeitos Software engineering
topic	Control-flow Data-flow Engenharia de software Fault localization Fluxo de controle Fluxo de dados Localização de defeitos Software engineering
description	Testing and debugging are key tasks during the development cycle. However, they are among the most expensive activities during the development process. To improve the productivity of developers during the debugging process various fault localization techniques have been proposed, being Spectrum-based Fault Localization (SFL), or Coverage-based Fault Localization (CBFL), one of the most promising. SFL techniques pinpoints program elements (e.g., statements, branches, and definition-use associations), sorting them by their suspiciousness. Heuristics are used to rank the most suspicious program elements which are then mapped into lines to be inspected by developers. Although data-flow spectra (definition-use associations) has been shown to perform better than control-flow spectra (statements and branches) to locate the bug site, the high overhead to collect data-flow spectra has prevented their use on industry-level code. A data-flow coverage tool was recently implemented presenting on average 38% run-time overhead for large programs. Such a fairly modest overhead motivates the study of SFL techniques using data-flow information in programs similar to those developed in the industry. To achieve such a goal, we implemented Jaguar (JAva coveraGe faUlt locAlization Ranking), a tool that employ control-flow and data-flow coverage on SFL techniques. The effectiveness and efficiency of both coverages are compared using 173 faulty versions with sizes varying from 10 to 96 KLOC. Ten known SFL heuristics to rank the most suspicious lines are utilized. The results show that the behavior of the heuristics are similar both to control- and data-flow coverage: Kulczynski2 and Mccon perform better for small number of lines investigated (from 5 to 30 lines) while Ochiai performs better when more lines are inspected (30 to 100 lines). The comparison between control- and data-flow coverages shows that data-flow locates more defects in the range of 10 to 50 inspected lines, being up to 22% more effective. Moreover, in the range of 20 and 100 lines, data-flow ranks the bug better than control-flow with statistical significance. However, data-flow is still more expensive than control-flow: it takes from 23% to 245% longer to obtain the most suspicious lines; on average data-flow is 129% more costly. Therefore, our results suggest that data-flow is more effective in locating faults because it tracks more relationships during the program execution. On the other hand, SFL techniques supported by data-flow coverage needs to be improved for practical use at industrial settings
publishDate	2016
dc.date.none.fl_str_mv	2016-08-19
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://www.teses.usp.br/teses/disponiveis/100/100131/tde-18102016-092654/
url	http://www.teses.usp.br/teses/disponiveis/100/100131/tde-18102016-092654/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815256518047563776

On the use of control- and data-ow in fault localization

Registros relacionados