Studying the prevalence of Atoms of Confusion in long-lived Java libraries

Detalhes bibliográficos
Autor(a) principal: Mendes, Wendell Militão Fernandes
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da Universidade Federal do Ceará (UFC)
Texto Completo: http://www.repositorio.ufc.br/handle/riufc/69632
Resumo: Program comprehension is a fundamental activity in software maintenance and evolution, impacting several tasks such as bug fixing, code reuse and implementation of new features. The Atom of Confusion (AC) is considered the smallest piece of code that can confuse programmers, difficulting the correct understanding of the source code under consideration. Previous studies have shown that these atoms can significantly impact the presence of bugs in C/C++ programs and increase the time and effort to code understanding in C/C++ and Java programs. To gather more evidence about the diffusion of ACs in the Java ecosystem, we conduct a study to analyze the prevalence, co-occurrences (at the class level), and evolution of ACs in 27 long-lived Java libraries. To support our investigation, we developed an ACs automatic search tool called BOHR. This tool aims to: (i) aid in the identification of ACs in Java systems; (ii) provide prevalence reports of these ACs; and (iii) provide an API for the development of new custom finders to capture new ACs, as well as improve already implemented ACs identifications. BOHR is able to detect 10 of the 14 types of ACs pointed out by Langhout and Aniche (LANGHOUT; ANICHE, 2021). We also provide a dataset, manually annotated, used to validate BOHR accuracy. Using BOHR, we found 11,404 occurrences in the studied libraries. The Conditional Operator and Logic as Control Flow ACs were the most prevalent among the 10 types of ACs assessed. Our findings show that Conditional Operator and Logic as Control Flow were more likely to co-occur in the same class. Finally, we observed that the prevalence of ACs did not decrease over time. On the contrary, in 13 libraries, the presence grew proportionally more than the size of the library in lines of code. Furthermore, in 15 libraries, the fraction of Java classes containing at least one AC also increases over time.
id UFC-7_df618d47de8c22e93745fc6d9aba0a13
oai_identifier_str oai:repositorio.ufc.br:riufc/69632
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Studying the prevalence of Atoms of Confusion in long-lived Java librariesStudying the prevalence of Atoms of Confusion in long-lived Java librariesEmpirical studyProgram comprehensionAtoms of ConfusionLong-lived Java projectsProgram comprehension is a fundamental activity in software maintenance and evolution, impacting several tasks such as bug fixing, code reuse and implementation of new features. The Atom of Confusion (AC) is considered the smallest piece of code that can confuse programmers, difficulting the correct understanding of the source code under consideration. Previous studies have shown that these atoms can significantly impact the presence of bugs in C/C++ programs and increase the time and effort to code understanding in C/C++ and Java programs. To gather more evidence about the diffusion of ACs in the Java ecosystem, we conduct a study to analyze the prevalence, co-occurrences (at the class level), and evolution of ACs in 27 long-lived Java libraries. To support our investigation, we developed an ACs automatic search tool called BOHR. This tool aims to: (i) aid in the identification of ACs in Java systems; (ii) provide prevalence reports of these ACs; and (iii) provide an API for the development of new custom finders to capture new ACs, as well as improve already implemented ACs identifications. BOHR is able to detect 10 of the 14 types of ACs pointed out by Langhout and Aniche (LANGHOUT; ANICHE, 2021). We also provide a dataset, manually annotated, used to validate BOHR accuracy. Using BOHR, we found 11,404 occurrences in the studied libraries. The Conditional Operator and Logic as Control Flow ACs were the most prevalent among the 10 types of ACs assessed. Our findings show that Conditional Operator and Logic as Control Flow were more likely to co-occur in the same class. Finally, we observed that the prevalence of ACs did not decrease over time. On the contrary, in 13 libraries, the presence grew proportionally more than the size of the library in lines of code. Furthermore, in 15 libraries, the fraction of Java classes containing at least one AC also increases over time.A atividade de compreensão do código-fonte é fundamental na manutenção e evolução de software, impactando em várias tarefas como: a correção de bugs, a reutilização de código e a implementação de novas funcionalidades. Um Átomo da Confusão (AC) é considerado a menor porção de código capaz de causar confusão em programadores, dificultando a correta compreensão de um código-fonte. Estudos anteriores mostraram que esses átomos podem ter um impacto significativo na presença de bugs em programas em C/C++ e aumentar o tempo e o esforço para a compreensão do código em sistemas C/C++ e Java. Para obter mais evidências sobre a difusão de ACs no ecossistema Java, essa pesquisa de mestrado realizou um estudo para analisar a prevalência, a co-ocorrência (a nível de classe), e a evolução de ACs em 27 bibliotecas tradicionais em Java. Para apoiar a investigação, foi desenvolvida uma ferramenta de pesquisa automática de ACs chamada BOHR. Esta ferramenta visa: (i) ajudar na identificação de ACs em sistemas Java; (ii) fornecer relatórios de prevalência desses ACs; e (iii) fornecer um API para o desenvolvimento de novos localizadores personalizados para a captura de novos ACs, bem como melhorar as identificações de átomos já implementadas. A ferramenta BOHR é capaz de detectar 10 dos 14 tipos de ACs apontados por Langhout e Aniche (LANGHOUT; ANICHE, 2021). Além da ferramenta, foi fornecido um conjunto de dados de projetos Java, anotado manualmente, utilizado para validar a precisão da ferramenta. Usando a ferramenta BOHR, foram encontradas 11.404 ocorrências nas bibliotecas estudadas. O Conditional Operator e o Logic as Control Flow foram os átomos mais prevalentes entre os 10 tipos de ACs avaliados. Observou-se que o Conditional Operator e o Logic as Control Flow foram mais suscetíveis a co-ocorrer em uma mesma classe. Por fim, a prevalência de ACs não diminuiu ao longo do tempo nos projetos analisados. Pelo contrário, em 13 bibliotecas, a presença cresceu proporcionalmente mais do que o tamanho da biblioteca em termos de linhas de código. Além disso, em 15 bibliotecas, a fração de classes Java contendo pelo menos um átomo também aumenta ao longo do tempo.Carvalho, Windson Viana deRocha, Lincoln SouzaMendes, Wendell Militão Fernandes2022-12-05T11:36:40Z2022-12-05T11:36:40Z2022info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfMENDES, Wendell Militão Fernandes. Studying the prevalence of Atoms of Confusion in long-lived Java libraries. 2022. 69 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.http://www.repositorio.ufc.br/handle/riufc/69632engreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccess2022-12-05T11:36:41Zoai:repositorio.ufc.br:riufc/69632Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2022-12-05T11:36:41Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.none.fl_str_mv Studying the prevalence of Atoms of Confusion in long-lived Java libraries
Studying the prevalence of Atoms of Confusion in long-lived Java libraries
title Studying the prevalence of Atoms of Confusion in long-lived Java libraries
spellingShingle Studying the prevalence of Atoms of Confusion in long-lived Java libraries
Mendes, Wendell Militão Fernandes
Empirical study
Program comprehension
Atoms of Confusion
Long-lived Java projects
title_short Studying the prevalence of Atoms of Confusion in long-lived Java libraries
title_full Studying the prevalence of Atoms of Confusion in long-lived Java libraries
title_fullStr Studying the prevalence of Atoms of Confusion in long-lived Java libraries
title_full_unstemmed Studying the prevalence of Atoms of Confusion in long-lived Java libraries
title_sort Studying the prevalence of Atoms of Confusion in long-lived Java libraries
author Mendes, Wendell Militão Fernandes
author_facet Mendes, Wendell Militão Fernandes
author_role author
dc.contributor.none.fl_str_mv Carvalho, Windson Viana de
Rocha, Lincoln Souza
dc.contributor.author.fl_str_mv Mendes, Wendell Militão Fernandes
dc.subject.por.fl_str_mv Empirical study
Program comprehension
Atoms of Confusion
Long-lived Java projects
topic Empirical study
Program comprehension
Atoms of Confusion
Long-lived Java projects
description Program comprehension is a fundamental activity in software maintenance and evolution, impacting several tasks such as bug fixing, code reuse and implementation of new features. The Atom of Confusion (AC) is considered the smallest piece of code that can confuse programmers, difficulting the correct understanding of the source code under consideration. Previous studies have shown that these atoms can significantly impact the presence of bugs in C/C++ programs and increase the time and effort to code understanding in C/C++ and Java programs. To gather more evidence about the diffusion of ACs in the Java ecosystem, we conduct a study to analyze the prevalence, co-occurrences (at the class level), and evolution of ACs in 27 long-lived Java libraries. To support our investigation, we developed an ACs automatic search tool called BOHR. This tool aims to: (i) aid in the identification of ACs in Java systems; (ii) provide prevalence reports of these ACs; and (iii) provide an API for the development of new custom finders to capture new ACs, as well as improve already implemented ACs identifications. BOHR is able to detect 10 of the 14 types of ACs pointed out by Langhout and Aniche (LANGHOUT; ANICHE, 2021). We also provide a dataset, manually annotated, used to validate BOHR accuracy. Using BOHR, we found 11,404 occurrences in the studied libraries. The Conditional Operator and Logic as Control Flow ACs were the most prevalent among the 10 types of ACs assessed. Our findings show that Conditional Operator and Logic as Control Flow were more likely to co-occur in the same class. Finally, we observed that the prevalence of ACs did not decrease over time. On the contrary, in 13 libraries, the presence grew proportionally more than the size of the library in lines of code. Furthermore, in 15 libraries, the fraction of Java classes containing at least one AC also increases over time.
publishDate 2022
dc.date.none.fl_str_mv 2022-12-05T11:36:40Z
2022-12-05T11:36:40Z
2022
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv MENDES, Wendell Militão Fernandes. Studying the prevalence of Atoms of Confusion in long-lived Java libraries. 2022. 69 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.
http://www.repositorio.ufc.br/handle/riufc/69632
identifier_str_mv MENDES, Wendell Militão Fernandes. Studying the prevalence of Atoms of Confusion in long-lived Java libraries. 2022. 69 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2022.
url http://www.repositorio.ufc.br/handle/riufc/69632
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1809935787092869120