Understanding and classifying code harmfulness
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da Universidade Federal de Alagoas (UFAL) |
Texto Completo: | http://www.repositorio.ufal.br/handle/riufal/6966 |
Resumo: | Code smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code. |
id |
UFAL_8c67206416e5513f2d2b4f80f23f1235 |
---|---|
oai_identifier_str |
oai:www.repositorio.ufal.br:riufal/6966 |
network_acronym_str |
UFAL |
network_name_str |
Repositório Institucional da Universidade Federal de Alagoas (UFAL) |
repository_id_str |
|
spelling |
Understanding and classifying code harmfulnessEntendendo e reconhecendo códigos prejudiciaisCode smellsSoftware – QualidadeAprendizagem de máquinaCode SmellsSoftware QualityMachine LearningCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOCode smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code.FAPEAL - Fundação de Amparo à Pesquisa do Estado de AlagoasCode Smells geralmente indicam más opções de implementação que podem prejudicar a qualidade do software. Portanto, eles precisam ser detectados com cuidado para evitar degradação do software. Nesse contexto, alguns estudos tentam entender o impacto dos Code Smells na qualidade do software, enquanto outros propõem regras ou técnicas baseadas em aprendizado de máquina para detectar Code Smells. No entanto, nenhum desses estudos / técnicas se concentram na análise de trechos de código que são realmente prejudiciais à qualidade do software. Nosso estudo tem como objetivo entender e classificar a nocividade do código. Analisamos a nocividade em termos de código CLEAN, SMELLY, BUGGY e HARMFUL. Por código nocivo, queremos dizer código que já prejudicou a qualidade do software e ainda está sujeito a danos. Realizamos nosso estudo com 22 tipos de Smells, 803 versões de 12 projetos de código aberto, 40.340 bugs e 132.219 Code Smells. Os resultados mostram que, embora tenhamos um número alto de Code Smells, apenas 0,07% desses Smells são prejudiciais. O Abstract Call From Constructor é o tipo de Smell mais relacionado ao código nocivo. Para validar empiricamente nossos resultados, também realizamos uma pesquisa com 77 desenvolvedores. A maioria deles (90,4%) considera Code Smells prejudiciais ao software e 84,6% desses desenvolvedores acreditam que as ferramentas de detecção de Code Smells são importantes. Mas, esses desenvolvedores não estão preocupados em selecionar ferramentas capazes de detectar Code Smells. Também avaliamos técnicas de aprendizado de máquina para classificar a nocividade do código: elas atingem a eficácia de pelo menos 97% para classificar o código nocivo. Enquanto Random Forest é eficaz na classificação de Code Smells e nocivos, o Gaussian Naïve Bayes é a técnica menos eficaz. Nossos resultados também sugerem que as métricas de software e desenvolvedores são importantes para classificar códigos nocivos.Universidade Federal de AlagoasBrasilPrograma de Pós-Graduação em InformáticaUFALSantos Neto, Baldoino Fonseca doshttp://lattes.cnpq.br/0306751604362704Ribeiro, Márcio de Medeiroshttp://lattes.cnpq.br/9300936571715992Teixeira, Leopoldo Mottahttp://lattes.cnpq.br/2117651910340729Lima, Rodrigo dos Santos2020-05-20T21:18:41Z2020-05-182020-05-20T21:18:41Z2020-02-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfLIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020.http://www.repositorio.ufal.br/handle/riufal/6966porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal de Alagoas (UFAL)instname:Universidade Federal de Alagoas (UFAL)instacron:UFAL2020-05-20T21:20:10Zoai:www.repositorio.ufal.br:riufal/6966Repositório InstitucionalPUBhttp://www.repositorio.ufal.br/oai/requestri@sibi.ufal.bropendoar:2020-05-20T21:20:10Repositório Institucional da Universidade Federal de Alagoas (UFAL) - Universidade Federal de Alagoas (UFAL)false |
dc.title.none.fl_str_mv |
Understanding and classifying code harmfulness Entendendo e reconhecendo códigos prejudiciais |
title |
Understanding and classifying code harmfulness |
spellingShingle |
Understanding and classifying code harmfulness Lima, Rodrigo dos Santos Code smells Software – Qualidade Aprendizagem de máquina Code Smells Software Quality Machine Learning CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
title_short |
Understanding and classifying code harmfulness |
title_full |
Understanding and classifying code harmfulness |
title_fullStr |
Understanding and classifying code harmfulness |
title_full_unstemmed |
Understanding and classifying code harmfulness |
title_sort |
Understanding and classifying code harmfulness |
author |
Lima, Rodrigo dos Santos |
author_facet |
Lima, Rodrigo dos Santos |
author_role |
author |
dc.contributor.none.fl_str_mv |
Santos Neto, Baldoino Fonseca dos http://lattes.cnpq.br/0306751604362704 Ribeiro, Márcio de Medeiros http://lattes.cnpq.br/9300936571715992 Teixeira, Leopoldo Motta http://lattes.cnpq.br/2117651910340729 |
dc.contributor.author.fl_str_mv |
Lima, Rodrigo dos Santos |
dc.subject.por.fl_str_mv |
Code smells Software – Qualidade Aprendizagem de máquina Code Smells Software Quality Machine Learning CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
topic |
Code smells Software – Qualidade Aprendizagem de máquina Code Smells Software Quality Machine Learning CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
description |
Code smells typically indicate poor implementation choices that may degrade software quality. Hence, they need to be carefully detected to avoid such degradation. In this context, some studies try to understand the impact of code smells on the software quality, while others propose rules or machine learning-based techniques to detect code smells. However, none of those studies/techniques focus on analyzing code snippets that are really harmful to software quality. Our study aims to understand and classify code harmfulness. We analyze harmfulness in terms of CLEAN, SMELLY, BUGGY, and HARMFUL code. By harmful code, we mean code that has already harmed software quality and is still prone to harm. We perform our study with 22 smell types, 803 versions of 12 open-source projects, 40,340 bugs and 132,219 code smells. The results show that even though we have a high number of code smells, only 0.07% of those smells are harmful. The Abstract Function Call From Constructor is the smell type more related to harmful code. To cross-validate our results, we also perform a survey with 77 developers. Most of them (90.4%) consider code smells harmful to the software, and 84.6% of those developers believe that code smells detection tools are important. But, those developers are not concerned about selecting tools that are able to detect harmful code. We also evaluate machine learning techniques to classify code harmfulness: they reach the effectiveness of at least 97% to classify harmful code. While the Random Forest is effective in classifying both smelly and harmful code, the Gaussian Naive Bayes is the less effective technique. Our results also suggest that both software and developers’ metrics are important to classify harmful code. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-05-20T21:18:41Z 2020-05-18 2020-05-20T21:18:41Z 2020-02-28 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
LIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020. http://www.repositorio.ufal.br/handle/riufal/6966 |
identifier_str_mv |
LIMA, Rodrigo dos Santos. Understanding and classifying code harmfulness. 2020. 54 f. Dissertação (Mestrado em Informática) - Instituto de Computação, Programa de Pós-Graduação em Informática, Universidade Federal de Alagoas, Maceió, 2020. |
url |
http://www.repositorio.ufal.br/handle/riufal/6966 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de Alagoas Brasil Programa de Pós-Graduação em Informática UFAL |
publisher.none.fl_str_mv |
Universidade Federal de Alagoas Brasil Programa de Pós-Graduação em Informática UFAL |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal de Alagoas (UFAL) instname:Universidade Federal de Alagoas (UFAL) instacron:UFAL |
instname_str |
Universidade Federal de Alagoas (UFAL) |
instacron_str |
UFAL |
institution |
UFAL |
reponame_str |
Repositório Institucional da Universidade Federal de Alagoas (UFAL) |
collection |
Repositório Institucional da Universidade Federal de Alagoas (UFAL) |
repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal de Alagoas (UFAL) - Universidade Federal de Alagoas (UFAL) |
repository.mail.fl_str_mv |
ri@sibi.ufal.br |
_version_ |
1748233738093854720 |