Mining Software Model Repositories

Detalhes bibliográficos
Autor(a) principal: Lacão, Guilherme Ferreira
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/163569
Resumo: Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.
id RCAP_414eba2c74e18bbeab0e33a95cfcde26
oai_identifier_str oai:run.unl.pt:10362/163569
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Mining Software Model RepositoriesUML Class DiagramModel Quality FactorsPhysics of NotationsMining RepositoriesWeb ScrapingModellingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaModelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.As linguagens de modelação no desenvolvimento de software são cruciais para capturar requisitos e na representação de designs, arquiteturas e implementações de software. A presente dissertação foca-se nos diagramas de classes UML,uma linguagem de modelação amplamente adotada no desenvolvimento de software orientado a objetos. A qualidade dos modelos de diagrama de classes UML pode impactar significamente a qualidade do sistema que estes representam. Defeitos presentes nestes modelos podem dificultar a compreensão dos stakeholders, introduzir complexidade desnecessária e propagar-se para o sistema desenvolvido, levando ao aumento de custos. Portanto, compreender os defeitos mais comuns presentes nestes diagramas é crucial. Além disso, com o crescimento de repositórios publicamente disponíveis, uma vasta quantidade de informações valiosas, incluindo diagramas de classes UML, está acessível. Oferecendo, deste modo, a oportunidade de estudar um grande número de modelos extraídos destes repositórios. Nesta dissertação, é apresentada uma ferramenta de avaliação automatizada para avaliar um dataset composto por 103,103 diagramas de classes UML, a fim de identificar defeitos presentes nestes diagramas. A criação deste dataset envolveu o desenvolvimento de uma ferramenta de web scraping projetada para extrair diagramas de classes UML de projetos de repositórios públicos. Os princípios da Physics ofNotations propostos porMoody e os princípios de diagram size e diagram flaws propostos por Störrle são incorporados na ferramenta de avaliação automatizada para identificar defeitos. Isto permitiu-nos analisar como os diagramas de classes UML disponíveis em repositórios públicos são construídos “in the wild”, e detetar quais são as violações mais frequentes dos princípios de modelação propostos por Moody e Störrle.Goulão, MiguelRUNLacão, Guilherme Ferreira2024-02-15T14:26:28Z2023-122023-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/163569enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:47:24Zoai:run.unl.pt:10362/163569Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:28.181998Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Mining Software Model Repositories
title Mining Software Model Repositories
spellingShingle Mining Software Model Repositories
Lacão, Guilherme Ferreira
UML Class Diagram
Model Quality Factors
Physics of Notations
Mining Repositories
Web Scraping
Modelling
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Mining Software Model Repositories
title_full Mining Software Model Repositories
title_fullStr Mining Software Model Repositories
title_full_unstemmed Mining Software Model Repositories
title_sort Mining Software Model Repositories
author Lacão, Guilherme Ferreira
author_facet Lacão, Guilherme Ferreira
author_role author
dc.contributor.none.fl_str_mv Goulão, Miguel
RUN
dc.contributor.author.fl_str_mv Lacão, Guilherme Ferreira
dc.subject.por.fl_str_mv UML Class Diagram
Model Quality Factors
Physics of Notations
Mining Repositories
Web Scraping
Modelling
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic UML Class Diagram
Model Quality Factors
Physics of Notations
Mining Repositories
Web Scraping
Modelling
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.
publishDate 2023
dc.date.none.fl_str_mv 2023-12
2023-12-01T00:00:00Z
2024-02-15T14:26:28Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/163569
url http://hdl.handle.net/10362/163569
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138174345150464