Technical debt prioritization: methods, techniques, and a large exploratory study
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/45/45134/tde-11092023-221224/ |
Resumo: | Software development teams need to prioritize the technical debt items payment to improve the software quality and ensure the new feature and code maintenance development pace. Identification tools can find thousands of technical debt items in a code repository. Thus, it is infeasible to pay off all items because it would take months or even years. Therefore, the team must decide which items should be paid off and when to pay them. We performed a mapping review to identify studies that assist in the technical debt prioritization process. We found papers that conceptualized the process, developed prioritization frameworks, and applied various methods to accomplish prioritization. Despite the efforts, a prioritization method that considers the software development context, works for several programming languages, covers different types of technical debt, and is integrated into a tool to apply it in practice still needs to be developed. Based on the mapping review, our motivation for this research is to understand how developers prioritize technical debt items in real software projects. Furthermore, we also apply machine learning methods to automate the prioritization process. We developed the Sonarlizer Xplorer tool to mine and analyze public projects hosted on GitHub supporting our studies. The result of applying the tool is a list of technical debt items and code metrics for many software projects. We applied a questionnaire to collect data from public Java projects to understand which criteria software developers use to prioritize code technical debt in real projects. We analyzed the data using Straussian Grounded Theory. We grouped the criteria into fifteen categories and divided them into two super-categories related to technical debt payment and three related to non-payment. We have found that when developers decide to pay off a technical debt item, they want to pay it off soon. When they decide not to pay, it is usually because the debt was acquired intentionally and is related to design decisions. When they used similar criteria, the payment priority levels were similar. Finally, we note that each software project needs its specific rules to identify its technical debt items. We also study the application of machine learning methods to prioritize technical debt items in real software projects. We applied the same questionnaire as in the previous study and obtained 2,616 responses. We create a dataset using three labeling strategies: \"pay or not\", 3-classes, and priority. We applied nine well-known machine learning methods on 27 code metrics to build a model for deciding whether a technical debt item should be paid (with an accuracy mean of 0.79 and F1 mean of around 0.86) and when to pay, applying four approaches achieving accuracy performance of 0.57 using traditional analysis and 0.81 using tuned analysis. |
id |
USP_94dc9fa1182dfd5458336847835da76c |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-11092023-221224 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Technical debt prioritization: methods, techniques, and a large exploratory studyPriorização de dívida técnica: métodos, técnicas e um estudo exploratórioAprendizado de máquinaArtificial intelligenceDívida técnicaGerenciamento de dívida técnicaInteligência artificialMachine learningPriorização de dívida técnicaTechnical debtTechnical debt managementTechnical debt prioritizationSoftware development teams need to prioritize the technical debt items payment to improve the software quality and ensure the new feature and code maintenance development pace. Identification tools can find thousands of technical debt items in a code repository. Thus, it is infeasible to pay off all items because it would take months or even years. Therefore, the team must decide which items should be paid off and when to pay them. We performed a mapping review to identify studies that assist in the technical debt prioritization process. We found papers that conceptualized the process, developed prioritization frameworks, and applied various methods to accomplish prioritization. Despite the efforts, a prioritization method that considers the software development context, works for several programming languages, covers different types of technical debt, and is integrated into a tool to apply it in practice still needs to be developed. Based on the mapping review, our motivation for this research is to understand how developers prioritize technical debt items in real software projects. Furthermore, we also apply machine learning methods to automate the prioritization process. We developed the Sonarlizer Xplorer tool to mine and analyze public projects hosted on GitHub supporting our studies. The result of applying the tool is a list of technical debt items and code metrics for many software projects. We applied a questionnaire to collect data from public Java projects to understand which criteria software developers use to prioritize code technical debt in real projects. We analyzed the data using Straussian Grounded Theory. We grouped the criteria into fifteen categories and divided them into two super-categories related to technical debt payment and three related to non-payment. We have found that when developers decide to pay off a technical debt item, they want to pay it off soon. When they decide not to pay, it is usually because the debt was acquired intentionally and is related to design decisions. When they used similar criteria, the payment priority levels were similar. Finally, we note that each software project needs its specific rules to identify its technical debt items. We also study the application of machine learning methods to prioritize technical debt items in real software projects. We applied the same questionnaire as in the previous study and obtained 2,616 responses. We create a dataset using three labeling strategies: \"pay or not\", 3-classes, and priority. We applied nine well-known machine learning methods on 27 code metrics to build a model for deciding whether a technical debt item should be paid (with an accuracy mean of 0.79 and F1 mean of around 0.86) and when to pay, applying four approaches achieving accuracy performance of 0.57 using traditional analysis and 0.81 using tuned analysis.Equipes de desenvolvimento de software precisam priorizar o pagamento de itens de dívida técnica para melhorar a qualidade do software e garantir um ritmo no desenvolvimento de novas funções e manutenção do código. Ferramentas de identificação são capazes de encontrar milhares de itens de dívida técnica de código em um repositório. Logo, é inviável pagar todos os itens, pois levaria meses ou até anos. Portanto, o time precisa decidir quais itens deveram ser pagos e quando realizar o pagamento. Nós realizamos um mapeamento da literatura para identificar os trabalhos realizados para ajudar no processo de priorização de dívida técnica. Nós encontramos trabalhos que conceituam o processo, desenvolvem arcabouços de priorização e aplicação de diversos métodos para realizar a priorização. Apesar dos esforços realizados, ainda não foi desenvolvido um método de priorização que considera o contexto do desenvolvimento do software, funcione em várias linguagens de programação, cubram diversos tipos de dívida técnica e seja integrado a uma ferramenta para aplicá-lo na prática. A partir do mapeamento, a nossa motivação para esta pesquisa é entender como os desenvolvedores priorizam itens de dívida técnica em projetos reais de software. Além disso, nós também aplicamos métodos de aprendizado de máquina para automatizar o processo de priorização. Nós desenvolvemos a ferramenta Sonarlizer Xplorer para minerar e analisar projetos públicos hospedados no GitHub suportando nossos estudos. O resultado da aplicação da ferramenta é uma lista com itens de dívida técnica e métricas de código de um grande número de projetos de software. Nós aplicamos um questionário para coletar dados de projetos Java públicos para entender quais critérios os desenvolvedores de software usam para priorizar dívida técnica de código em projetos reais. Então, analisamos os dados usando Teoria Fundamentada Straussiana e agrupamos os critérios em quinze categorias, dividindo-as em duas super-categorias relacionadas ao pagamento da dívida técnica e três relacionadas ao não pagamento. Nós encontramos que quando os desenvolvedores decidiram pagar um item de dívida técnica, eles querem pagar logo. Quando eles decidem não pagar, geralmente é porque a dívida foi adquirida intencionalmente e está relacionado a decisões de projeto. Quando eles usaram critérios parecidos, a níveis de prioridade de pagamento são parecidos. Por fim, nós observamos que cada projeto de software precisa de regras próprias para identificar seus itens de dívida técnica. Nós também estudamos a aplicação de métodos de aprendizado de máquina para priorizar os itens de dívida técnica em projetos reais de software. Nós aplicamos o mesmo questionário do estudo anterior e obtivemos 2.616 respostas. Com as respostas, criamos um dataset usando três estratégias de rotulação: \"pagar ou não\", 3-classes e prioridade. Então, aplicamos nove métodos de machine learning bem-conhecidos sobre 27 métricas de código para construir um modelo para decidir se um item de dívida técnica deve ser pago (com acurácia de 0,79 e F1 de 0,85) e quando realizar o pagamento, aplicando quatro abordagens atingindo desempenho de acurácia de 0,57 usando análise tradicional e 0,81 usando análise tunada.Biblioteca Digitais de Teses e Dissertações da USPLejbman, Alfredo Goldman VelPina, Diogo de Jesus2023-08-24info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-11092023-221224/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2023-10-31T22:18:02Zoai:teses.usp.br:tde-11092023-221224Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212023-10-31T22:18:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Technical debt prioritization: methods, techniques, and a large exploratory study Priorização de dívida técnica: métodos, técnicas e um estudo exploratório |
title |
Technical debt prioritization: methods, techniques, and a large exploratory study |
spellingShingle |
Technical debt prioritization: methods, techniques, and a large exploratory study Pina, Diogo de Jesus Aprendizado de máquina Artificial intelligence Dívida técnica Gerenciamento de dívida técnica Inteligência artificial Machine learning Priorização de dívida técnica Technical debt Technical debt management Technical debt prioritization |
title_short |
Technical debt prioritization: methods, techniques, and a large exploratory study |
title_full |
Technical debt prioritization: methods, techniques, and a large exploratory study |
title_fullStr |
Technical debt prioritization: methods, techniques, and a large exploratory study |
title_full_unstemmed |
Technical debt prioritization: methods, techniques, and a large exploratory study |
title_sort |
Technical debt prioritization: methods, techniques, and a large exploratory study |
author |
Pina, Diogo de Jesus |
author_facet |
Pina, Diogo de Jesus |
author_role |
author |
dc.contributor.none.fl_str_mv |
Lejbman, Alfredo Goldman Vel |
dc.contributor.author.fl_str_mv |
Pina, Diogo de Jesus |
dc.subject.por.fl_str_mv |
Aprendizado de máquina Artificial intelligence Dívida técnica Gerenciamento de dívida técnica Inteligência artificial Machine learning Priorização de dívida técnica Technical debt Technical debt management Technical debt prioritization |
topic |
Aprendizado de máquina Artificial intelligence Dívida técnica Gerenciamento de dívida técnica Inteligência artificial Machine learning Priorização de dívida técnica Technical debt Technical debt management Technical debt prioritization |
description |
Software development teams need to prioritize the technical debt items payment to improve the software quality and ensure the new feature and code maintenance development pace. Identification tools can find thousands of technical debt items in a code repository. Thus, it is infeasible to pay off all items because it would take months or even years. Therefore, the team must decide which items should be paid off and when to pay them. We performed a mapping review to identify studies that assist in the technical debt prioritization process. We found papers that conceptualized the process, developed prioritization frameworks, and applied various methods to accomplish prioritization. Despite the efforts, a prioritization method that considers the software development context, works for several programming languages, covers different types of technical debt, and is integrated into a tool to apply it in practice still needs to be developed. Based on the mapping review, our motivation for this research is to understand how developers prioritize technical debt items in real software projects. Furthermore, we also apply machine learning methods to automate the prioritization process. We developed the Sonarlizer Xplorer tool to mine and analyze public projects hosted on GitHub supporting our studies. The result of applying the tool is a list of technical debt items and code metrics for many software projects. We applied a questionnaire to collect data from public Java projects to understand which criteria software developers use to prioritize code technical debt in real projects. We analyzed the data using Straussian Grounded Theory. We grouped the criteria into fifteen categories and divided them into two super-categories related to technical debt payment and three related to non-payment. We have found that when developers decide to pay off a technical debt item, they want to pay it off soon. When they decide not to pay, it is usually because the debt was acquired intentionally and is related to design decisions. When they used similar criteria, the payment priority levels were similar. Finally, we note that each software project needs its specific rules to identify its technical debt items. We also study the application of machine learning methods to prioritize technical debt items in real software projects. We applied the same questionnaire as in the previous study and obtained 2,616 responses. We create a dataset using three labeling strategies: \"pay or not\", 3-classes, and priority. We applied nine well-known machine learning methods on 27 code metrics to build a model for deciding whether a technical debt item should be paid (with an accuracy mean of 0.79 and F1 mean of around 0.86) and when to pay, applying four approaches achieving accuracy performance of 0.57 using traditional analysis and 0.81 using tuned analysis. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-08-24 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-11092023-221224/ |
url |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-11092023-221224/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1815256874786750464 |