An initial investigation of ChatGPT unit test generation capability

Guilherme, Vitor Hugo

An initial investigation of ChatGPT unit test generation capability

Detalhes bibliográficos
Autor(a) principal:	Guilherme, Vitor Hugo
Data de Publicação:	2023
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/18502 https://github.com/aurimrv/initial-investigation-chatgpt-unit-tests.git
Resumo:	Software testing plays a crucial role in ensuring the quality of software, but developers often disregard it. The use of automated testing generation is pursued with the aim of reducing the consequences of overlooked test cases in a software project. Problem: In the context of Java programs, several tools can completely automate generating unit test sets. Additionally, there are studies conducted to offer evidence regarding the quality of the generated test sets. However, it is worth noting that these tools rely on machine learning and other AI algorithms rather than incorporating the latest advancements in Large Language Models (LLMs). Solution: This work aims to evaluate the quality of Java unit tests generated by an OpenAI LLM algorithm, using metrics like code coverage and mutation test score. Method: For this study, 33 programs used by other researchers in the field of automated test generation were selected. This approach was employed to establish a baseline for comparison purposes. For each program, 33 unit test sets were generated automatically, without human interference, by changing Open AI API parameters. After executing each test set, metrics such as code coverage, mutation score, and success rate of test execution were collected to evaluate the efficiency and effectiveness of each set. Summary of Results: Our findings revealed that the OpenAI LLM test set demonstrated similar performance across all evaluated aspects compared to traditional automated Java test generation tools used in the previous research. These results are particularly remarkable considering the simplicity of the experiment and the fact that the generated test code did not undergo human analysis.

Metadados do item

id	SCAR_b9c35a2afbab00c8c899a2712e90f0b1
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/18502
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Guilherme, Vitor HugoVincenzi, Auri Marcelo Rizzohttp://lattes.cnpq.br/0611351138131709http://lattes.cnpq.br/0080411869714148https://orcid.org/0009-0005-5868-290Xhttps://orcid.org/0000-0001-5902-167295b61f21-c491-4520-a120-ae9091cda22d2023-09-04T18:09:34Z2023-09-04T18:09:34Z2023-08-29GUILHERME, Vitor Hugo. An initial investigation of ChatGPT unit test generation capability. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18502.https://repositorio.ufscar.br/handle/ufscar/18502https://github.com/aurimrv/initial-investigation-chatgpt-unit-tests.gitSoftware testing plays a crucial role in ensuring the quality of software, but developers often disregard it. The use of automated testing generation is pursued with the aim of reducing the consequences of overlooked test cases in a software project. Problem: In the context of Java programs, several tools can completely automate generating unit test sets. Additionally, there are studies conducted to offer evidence regarding the quality of the generated test sets. However, it is worth noting that these tools rely on machine learning and other AI algorithms rather than incorporating the latest advancements in Large Language Models (LLMs). Solution: This work aims to evaluate the quality of Java unit tests generated by an OpenAI LLM algorithm, using metrics like code coverage and mutation test score. Method: For this study, 33 programs used by other researchers in the field of automated test generation were selected. This approach was employed to establish a baseline for comparison purposes. For each program, 33 unit test sets were generated automatically, without human interference, by changing Open AI API parameters. After executing each test set, metrics such as code coverage, mutation score, and success rate of test execution were collected to evaluate the efficiency and effectiveness of each set. Summary of Results: Our findings revealed that the OpenAI LLM test set demonstrated similar performance across all evaluated aspects compared to traditional automated Java test generation tools used in the previous research. These results are particularly remarkable considering the simplicity of the experiment and the fact that the generated test code did not undergo human analysis.O teste de software desempenha um papel crucial na garantia da qualidade do software, mas os desenvolvedores frequentemente o desconsideram. O uso da geração automática de testes é perseguido com o objetivo de reduzir as consequências de casos de teste negligenciados em um projeto de software. Problema: No contexto dos programas Java, várias ferramentas podem automatizar completamente a geração de conjuntos de testes unitários. Além disso, há estudos realizados para oferecer evidências sobre a qualidade dos conjuntos de testes gerados. No entanto, vale ressaltar que essas ferramentas dependem de aprendizado de máquina e outros algoritmos de IA, ao invés de incorporar os últimos avanços em Modelos de Linguagem Grande (LLMs). Solução: Este trabalho tem como objetivo avaliar a qualidade dos testes unitários Java gerados por um algoritmo LLM da OpenAI, usando métricas como cobertura de código e escore de mutação. Método: Para este estudo, 33 programas usados por outros pesquisadores no campo da geração automática de testes foram selecionados. Esta abordagem foi empregada para estabelecer uma base de comparação. Para cada programa, 33 conjuntos de testes unitários foram gerados automaticamente, sem interferência humana, alterando os parâmetros da API da OpenAI. Após a execução de cada conjunto de testes, métricas como cobertura de código, escore de mutação e taxa de sucesso da execução do teste foram coletadas para avaliar a eficiência e eficácia de cada conjunto. Resumo dos Resultados: Nossas descobertas revelaram que o conjunto de testes LLM da OpenAI demonstrou desempenho semelhante em todos os aspectos avaliados em comparação com as ferramentas tradicionais de geração automática de testes Java usadas nas pesquisas anteriores. Esses resultados são particularmente notáveis considerando a simplicidade do experimento e o fato de que o código de teste gerado não passou por análise humana.Não recebi financiamentoProcesso nº 2019/23160-0, Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Finance Code 001 - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosEngenharia de Computação - ECUFSCarAttribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/info:eu-repo/semantics/openAccessSoftware testingExperimental software engineeringAutomated test generationCoverage testingMutation testingTesting toolsCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWAREAn initial investigation of ChatGPT unit test generation capabilityUma investigação inicial da capacidade de geração de teste unitário do ChatGPTinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis600600d0b4a7ac-bc77-444f-a8c8-b7b9011ba495reponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTCC-final-Vitor-Guilherme.pdfTCC-final-Vitor-Guilherme.pdfTrabalho de Conclusão de Cursoapplication/pdf1104872https://repositorio.ufscar.br/bitstream/ufscar/18502/1/TCC-final-Vitor-Guilherme.pdfa3a3b709bcc19b606b294c13676c92f8MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8913https://repositorio.ufscar.br/bitstream/ufscar/18502/2/license_rdf3185b4de2190c2d366d1d324db01f8b8MD52TEXTTCC-final-Vitor-Guilherme.pdf.txtTCC-final-Vitor-Guilherme.pdf.txtExtracted texttext/plain54077https://repositorio.ufscar.br/bitstream/ufscar/18502/3/TCC-final-Vitor-Guilherme.pdf.txta44c641046f5985c31db105b9b0d43eeMD53ufscar/185022024-05-14 18:31:18.517oai:repositorio.ufscar.br:ufscar/18502Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222024-05-14T18:31:18Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.eng.fl_str_mv	An initial investigation of ChatGPT unit test generation capability
dc.title.alternative.por.fl_str_mv	Uma investigação inicial da capacidade de geração de teste unitário do ChatGPT
title	An initial investigation of ChatGPT unit test generation capability
spellingShingle	An initial investigation of ChatGPT unit test generation capability Guilherme, Vitor Hugo Software testing Experimental software engineering Automated test generation Coverage testing Mutation testing Testing tools CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE
title_short	An initial investigation of ChatGPT unit test generation capability
title_full	An initial investigation of ChatGPT unit test generation capability
title_fullStr	An initial investigation of ChatGPT unit test generation capability
title_full_unstemmed	An initial investigation of ChatGPT unit test generation capability
title_sort	An initial investigation of ChatGPT unit test generation capability
author	Guilherme, Vitor Hugo
author_facet	Guilherme, Vitor Hugo
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/0080411869714148
dc.contributor.authororcid.por.fl_str_mv	https://orcid.org/0009-0005-5868-290X
dc.contributor.advisor1orcid.por.fl_str_mv	https://orcid.org/0000-0001-5902-1672
dc.contributor.author.fl_str_mv	Guilherme, Vitor Hugo
dc.contributor.advisor1.fl_str_mv	Vincenzi, Auri Marcelo Rizzo
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/0611351138131709
dc.contributor.authorID.fl_str_mv	95b61f21-c491-4520-a120-ae9091cda22d
contributor_str_mv	Vincenzi, Auri Marcelo Rizzo
dc.subject.por.fl_str_mv	Software testing Experimental software engineering Automated test generation Coverage testing Mutation testing Testing tools
topic	Software testing Experimental software engineering Automated test generation Coverage testing Mutation testing Testing tools CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE
description	Software testing plays a crucial role in ensuring the quality of software, but developers often disregard it. The use of automated testing generation is pursued with the aim of reducing the consequences of overlooked test cases in a software project. Problem: In the context of Java programs, several tools can completely automate generating unit test sets. Additionally, there are studies conducted to offer evidence regarding the quality of the generated test sets. However, it is worth noting that these tools rely on machine learning and other AI algorithms rather than incorporating the latest advancements in Large Language Models (LLMs). Solution: This work aims to evaluate the quality of Java unit tests generated by an OpenAI LLM algorithm, using metrics like code coverage and mutation test score. Method: For this study, 33 programs used by other researchers in the field of automated test generation were selected. This approach was employed to establish a baseline for comparison purposes. For each program, 33 unit test sets were generated automatically, without human interference, by changing Open AI API parameters. After executing each test set, metrics such as code coverage, mutation score, and success rate of test execution were collected to evaluate the efficiency and effectiveness of each set. Summary of Results: Our findings revealed that the OpenAI LLM test set demonstrated similar performance across all evaluated aspects compared to traditional automated Java test generation tools used in the previous research. These results are particularly remarkable considering the simplicity of the experiment and the fact that the generated test code did not undergo human analysis.
publishDate	2023
dc.date.accessioned.fl_str_mv	2023-09-04T18:09:34Z
dc.date.available.fl_str_mv	2023-09-04T18:09:34Z
dc.date.issued.fl_str_mv	2023-08-29
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	GUILHERME, Vitor Hugo. An initial investigation of ChatGPT unit test generation capability. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18502.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/18502
dc.identifier.url.por.fl_str_mv	https://github.com/aurimrv/initial-investigation-chatgpt-unit-tests.git
identifier_str_mv	GUILHERME, Vitor Hugo. An initial investigation of ChatGPT unit test generation capability. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18502.
url	https://repositorio.ufscar.br/handle/ufscar/18502 https://github.com/aurimrv/initial-investigation-chatgpt-unit-tests.git
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	600 600
dc.relation.authority.fl_str_mv	d0b4a7ac-bc77-444f-a8c8-b7b9011ba495
dc.rights.driver.fl_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos Engenharia de Computação - EC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos Engenharia de Computação - EC
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/18502/1/TCC-final-Vitor-Guilherme.pdf https://repositorio.ufscar.br/bitstream/ufscar/18502/2/license_rdf https://repositorio.ufscar.br/bitstream/ufscar/18502/3/TCC-final-Vitor-Guilherme.pdf.txt
bitstream.checksum.fl_str_mv	a3a3b709bcc19b606b294c13676c92f8 3185b4de2190c2d366d1d324db01f8b8 a44c641046f5985c31db105b9b0d43ee
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1813715668455915520

An initial investigation of ChatGPT unit test generation capability

Registros relacionados