Understanding and classifying code harmfulness

Detalhes bibliográficos
Autor(a) principal: Lima, Rodrigo dos Santos
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da Universidade Federal de Alagoas (UFAL)
Texto Completo: http://www.repositorio.ufal.br/handle/riufal/6966
Resumo: Code smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code.
id UFAL_8c67206416e5513f2d2b4f80f23f1235
oai_identifier_str oai:www.repositorio.ufal.br:riufal/6966
network_acronym_str UFAL
network_name_str Repositório Institucional da Universidade Federal de Alagoas (UFAL)
repository_id_str
spelling Understanding and classifying code harmfulnessEntendendo e reconhecendo códigos prejudiciaisCode smellsSoftware – QualidadeAprendizagem de máquinaCode SmellsSoftware QualityMachine LearningCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOCode smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code.FAPEAL - Fundação de Amparo à Pesquisa do Estado de AlagoasCode Smells geralmente indicam más opções de implementação que podem prejudicar a qualidade do software. Portanto, eles precisam ser detectados com cuidado para evitar degradação do software. Nesse contexto, alguns estudos tentam entender o impacto dos Code Smells na qualidade do software, enquanto outros propõem regras ou técnicas baseadas em aprendizado de máquina para detectar Code Smells. No entanto, nenhum desses estudos / técnicas se concentram na análise de trechos de código que são realmente prejudiciais à qualidade do software. Nosso estudo tem como objetivo entender e classificar a nocividade do código. Analisamos a nocividade em termos de código CLEAN, SMELLY, BUGGY e HARMFUL. Por código nocivo, queremos dizer código que já prejudicou a qualidade do software e ainda está sujeito a danos. Realizamos nosso estudo com 22 tipos de Smells, 803 versões de 12 projetos de código aberto, 40.340 bugs e 132.219 Code Smells. Os resultados mostram que, embora tenhamos um número alto de Code Smells, apenas 0,07% desses Smells são prejudiciais. O Abstract Call From Constructor é o tipo de Smell mais relacionado ao código nocivo. Para validar empiricamente nossos resultados, também realizamos uma pesquisa com 77 desenvolvedores. A maioria deles (90,4%) considera Code Smells prejudiciais ao software e 84,6% desses desenvolvedores acreditam que as ferramentas de detecção de Code Smells são importantes. Mas, esses desenvolvedores não estão preocupados em selecionar ferramentas capazes de detectar Code Smells. Também avaliamos técnicas de aprendizado de máquina para classificar a nocividade do código: elas atingem a eficácia de pelo menos 97% para classificar o código nocivo. Enquanto Random Forest é eficaz na classificação de Code Smells e nocivos, o Gaussian Naïve Bayes é a técnica menos eficaz. Nossos resultados também sugerem que as métricas de software e desenvolvedores são importantes para classificar códigos nocivos.Universidade Federal de AlagoasBrasilPrograma de Pós-Graduação em InformáticaUFALSantos Neto, Baldoino Fonseca doshttp://lattes.cnpq.br/0306751604362704Ribeiro, Márcio de Medeiroshttp://lattes.cnpq.br/9300936571715992Teixeira, Leopoldo Mottahttp://lattes.cnpq.br/2117651910340729Lima, Rodrigo dos Santos2020-05-20T21:18:41Z2020-05-182020-05-20T21:18:41Z2020-02-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfLIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020.http://www.repositorio.ufal.br/handle/riufal/6966porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal de Alagoas (UFAL)instname:Universidade Federal de Alagoas (UFAL)instacron:UFAL2020-05-20T21:20:10Zoai:www.repositorio.ufal.br:riufal/6966Repositório InstitucionalPUBhttp://www.repositorio.ufal.br/oai/requestri@sibi.ufal.bropendoar:2020-05-20T21:20:10Repositório Institucional da Universidade Federal de Alagoas (UFAL) - Universidade Federal de Alagoas (UFAL)false
dc.title.none.fl_str_mv Understanding and classifying code harmfulness
Entendendo e reconhecendo códigos prejudiciais
title Understanding and classifying code harmfulness
spellingShingle Understanding and classifying code harmfulness
Lima, Rodrigo dos Santos
Code smells
Software – Qualidade
Aprendizagem de máquina
Code Smells
Software Quality
Machine Learning
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Understanding and classifying code harmfulness
title_full Understanding and classifying code harmfulness
title_fullStr Understanding and classifying code harmfulness
title_full_unstemmed Understanding and classifying code harmfulness
title_sort Understanding and classifying code harmfulness
author Lima, Rodrigo dos Santos
author_facet Lima, Rodrigo dos Santos
author_role author
dc.contributor.none.fl_str_mv Santos Neto, Baldoino Fonseca dos
http://lattes.cnpq.br/0306751604362704
Ribeiro, Márcio de Medeiros
http://lattes.cnpq.br/9300936571715992
Teixeira, Leopoldo Motta
http://lattes.cnpq.br/2117651910340729
dc.contributor.author.fl_str_mv Lima, Rodrigo dos Santos
dc.subject.por.fl_str_mv Code smells
Software – Qualidade
Aprendizagem de máquina
Code Smells
Software Quality
Machine Learning
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic Code smells
Software – Qualidade
Aprendizagem de máquina
Code Smells
Software Quality
Machine Learning
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Code smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code.
publishDate 2020
dc.date.none.fl_str_mv 2020-05-20T21:18:41Z
2020-05-18
2020-05-20T21:18:41Z
2020-02-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv LIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020.
http://www.repositorio.ufal.br/handle/riufal/6966
identifier_str_mv LIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020.
url http://www.repositorio.ufal.br/handle/riufal/6966
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Alagoas
Brasil
Programa de Pós-Graduação em Informática
UFAL
publisher.none.fl_str_mv Universidade Federal de Alagoas
Brasil
Programa de Pós-Graduação em Informática
UFAL
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal de Alagoas (UFAL)
instname:Universidade Federal de Alagoas (UFAL)
instacron:UFAL
instname_str Universidade Federal de Alagoas (UFAL)
instacron_str UFAL
institution UFAL
reponame_str Repositório Institucional da Universidade Federal de Alagoas (UFAL)
collection Repositório Institucional da Universidade Federal de Alagoas (UFAL)
repository.name.fl_str_mv Repositório Institucional da Universidade Federal de Alagoas (UFAL) - Universidade Federal de Alagoas (UFAL)
repository.mail.fl_str_mv ri@sibi.ufal.br
_version_ 1748233738093854720