Streamlining code smells: Using collective intelligence and visualization

Detalhes bibliográficos
Autor(a) principal: Reis, José Vicente Pereira dos
Data de Publicação: 2022
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/26465
Resumo: Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity
id RCAP_2d018a7e7dce531cb8cde53d1831f83d
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/26465
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Streamlining code smells: Using collective intelligence and visualizationCrowdsmellingCode smellsCrowdsourcingCode smells detectionMachine learningSoftware qualityCheiros de códigoDeteção de cheiros de códigoAprendizagem automáticaQualidade do softwareContext. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validityContexto. Os cheiros de código são a principal causa de dívida técnica (technical debt), como tal, devem ser detectados e removidos. Os cheiros de código já foram há muito tempo catalogados juntamente com as correspondentes soluções mitigadoras chamadas operações de refabricação (refactoring). No entanto, embora estas últimas sejam suportadas nas IDEs actuais (por exemplo, Eclipse), a deteção de cheiros de código têm ainda muitas limitações. Os investigadores argumentam que a subjectividade do processo de deteção de cheiros de código é um dos principais obstáculo à mitigação do problema da qualidade do código. Objectivo. Esta tese apresenta uma nova abordagem à detecção de cheiros de código, a que chamámos CrowdSmelling, e os resultados de uma experiência de validação para esta abordagem. A nossa abordagem de CrowdSmelling baseia-se em técnicas de aprendizagem automática supervisionada, onde a sabedoria da multidão (dos programadores de software) é utilizada para calibrar colectivamente algoritmos de detecção de cheiros de código, diminuindo assim a questão da subjectividade. Método. Em três anos consecutivos, no âmbito da Unidade Curricular de Engenharia de Software, uma "multidão", num total de cerca de uma centena de equipas, com uma média de três membros cada, classificou a presença de 3 cheiros de código (Long Method, God Class, and Feature Envy) em código fonte Java. Estas classificações foram a base dos oráculos utilizados para o treino de seis algoritmos de aprendizagem automática. Mais de cem modelos foram gerados e avaliados para determinar quais os algoritmos de aprendizagem de máquinas com melhor desempenho na detecção de cada um dos cheiros de código acima mencionados. Resultados. Foram obtidos bons desempenhos na detecção do God Class (ROC=0,896 para Naive Bayes) e na detecção do Long Method (ROC=0,870 para AdaBoostM1), mas muito mais baixos para Feature Envy (ROC=0,570 para Random Forrest). Conclusões. Os resultados obtidos sugerem que o Crowdsmelling é uma abordagem viável para a detecção de cheiros de código, mas são necessárias mais experiências de validação para cobrir mais cheiros de código e para aumentar a validade externa.2022-11-22T13:16:34Z2022-09-09T00:00:00Z2022-09-092022-06doctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfapplication/octet-streamhttp://hdl.handle.net/10071/26465TID:101551657eng978-989-781-725-0Reis, José Vicente Pereira dosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-07-07T02:39:25Zoai:repositorio.iscte-iul.pt:10071/26465Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-07-07T02:39:25Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Streamlining code smells: Using collective intelligence and visualization
title Streamlining code smells: Using collective intelligence and visualization
spellingShingle Streamlining code smells: Using collective intelligence and visualization
Reis, José Vicente Pereira dos
Crowdsmelling
Code smells
Crowdsourcing
Code smells detection
Machine learning
Software quality
Cheiros de código
Deteção de cheiros de código
Aprendizagem automática
Qualidade do software
title_short Streamlining code smells: Using collective intelligence and visualization
title_full Streamlining code smells: Using collective intelligence and visualization
title_fullStr Streamlining code smells: Using collective intelligence and visualization
title_full_unstemmed Streamlining code smells: Using collective intelligence and visualization
title_sort Streamlining code smells: Using collective intelligence and visualization
author Reis, José Vicente Pereira dos
author_facet Reis, José Vicente Pereira dos
author_role author
dc.contributor.author.fl_str_mv Reis, José Vicente Pereira dos
dc.subject.por.fl_str_mv Crowdsmelling
Code smells
Crowdsourcing
Code smells detection
Machine learning
Software quality
Cheiros de código
Deteção de cheiros de código
Aprendizagem automática
Qualidade do software
topic Crowdsmelling
Code smells
Crowdsourcing
Code smells detection
Machine learning
Software quality
Cheiros de código
Deteção de cheiros de código
Aprendizagem automática
Qualidade do software
description Context. Code smells are seen as major source of technical debt and, as such, should be detected and removed. Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in current IDEs (e.g., Eclipse), code smells detection scaffolding has still many limitations. Researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. Objective. This thesis presents a new approach to code smells detection that we have called CrowdSmelling and the results of a validation experiment for this approach. The latter is based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. Method. In the context of three consecutive years of a Software Engineering course, a total “crowd” of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Results. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Conclusions. Obtained results suggest that Crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity
publishDate 2022
dc.date.none.fl_str_mv 2022-11-22T13:16:34Z
2022-09-09T00:00:00Z
2022-09-09
2022-06
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/26465
TID:101551657
url http://hdl.handle.net/10071/26465
identifier_str_mv TID:101551657
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-989-781-725-0
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/octet-stream
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv mluisa.alvim@gmail.com
_version_ 1817546287637069824