Mining Software Model Repositories
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/163569 |
Resumo: | Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle. |
id |
RCAP_414eba2c74e18bbeab0e33a95cfcde26 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/163569 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Mining Software Model RepositoriesUML Class DiagramModel Quality FactorsPhysics of NotationsMining RepositoriesWeb ScrapingModellingDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaModelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle.As linguagens de modelação no desenvolvimento de software são cruciais para capturar requisitos e na representação de designs, arquiteturas e implementações de software. A presente dissertação foca-se nos diagramas de classes UML,uma linguagem de modelação amplamente adotada no desenvolvimento de software orientado a objetos. A qualidade dos modelos de diagrama de classes UML pode impactar significamente a qualidade do sistema que estes representam. Defeitos presentes nestes modelos podem dificultar a compreensão dos stakeholders, introduzir complexidade desnecessária e propagar-se para o sistema desenvolvido, levando ao aumento de custos. Portanto, compreender os defeitos mais comuns presentes nestes diagramas é crucial. Além disso, com o crescimento de repositórios publicamente disponíveis, uma vasta quantidade de informações valiosas, incluindo diagramas de classes UML, está acessível. Oferecendo, deste modo, a oportunidade de estudar um grande número de modelos extraídos destes repositórios. Nesta dissertação, é apresentada uma ferramenta de avaliação automatizada para avaliar um dataset composto por 103,103 diagramas de classes UML, a fim de identificar defeitos presentes nestes diagramas. A criação deste dataset envolveu o desenvolvimento de uma ferramenta de web scraping projetada para extrair diagramas de classes UML de projetos de repositórios públicos. Os princípios da Physics ofNotations propostos porMoody e os princípios de diagram size e diagram flaws propostos por Störrle são incorporados na ferramenta de avaliação automatizada para identificar defeitos. Isto permitiu-nos analisar como os diagramas de classes UML disponíveis em repositórios públicos são construídos “in the wild”, e detetar quais são as violações mais frequentes dos princípios de modelação propostos por Moody e Störrle.Goulão, MiguelRUNLacão, Guilherme Ferreira2024-02-15T14:26:28Z2023-122023-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/163569enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:47:24Zoai:run.unl.pt:10362/163569Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:28.181998Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Mining Software Model Repositories |
title |
Mining Software Model Repositories |
spellingShingle |
Mining Software Model Repositories Lacão, Guilherme Ferreira UML Class Diagram Model Quality Factors Physics of Notations Mining Repositories Web Scraping Modelling Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Mining Software Model Repositories |
title_full |
Mining Software Model Repositories |
title_fullStr |
Mining Software Model Repositories |
title_full_unstemmed |
Mining Software Model Repositories |
title_sort |
Mining Software Model Repositories |
author |
Lacão, Guilherme Ferreira |
author_facet |
Lacão, Guilherme Ferreira |
author_role |
author |
dc.contributor.none.fl_str_mv |
Goulão, Miguel RUN |
dc.contributor.author.fl_str_mv |
Lacão, Guilherme Ferreira |
dc.subject.por.fl_str_mv |
UML Class Diagram Model Quality Factors Physics of Notations Mining Repositories Web Scraping Modelling Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
UML Class Diagram Model Quality Factors Physics of Notations Mining Repositories Web Scraping Modelling Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Modelling languages in software development are crucial for capturing requirements and representing software designs, architectures, and implementations. This dissertation focuses on UML class diagrams, a modelling language widely adopted in object-oriented software development. The quality of UML class diagram models can significantly impact the quality of the system they represent. Defects present in these models can hinder stakeholder understanding, introduce unnecessarycomplexity, and propagate to the developed system, leading to increased costs. Therefore, understanding the most common defects present in these diagrams is crucial. Further, with the growth of publicly available repositories, a wealth of valuable information, including UML class diagrams, is accessible. This presents an opportunity to study a large number of models extracted from these repositories. In this dissertation, we present an automated evaluation tool to assess a dataset consisting of 103,103 UML class diagrams to identify the defects present in these diagrams. The creation of this dataset involved the development of a web scraping tool designed to extract UML class diagrams from public repository projects. The principles of the Physics of Notations proposed by Moody and the principles of diagram size and diagram flaws proposed by Störrle are incorporated into the automated evaluation tool to identify defects. This allowed us to analyse how UML class diagrams available in public repositories are built "in the wild", and to detect which are the most frequent violations of the modelling principles proposed by Moody and Störrle. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-12 2023-12-01T00:00:00Z 2024-02-15T14:26:28Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/163569 |
url |
http://hdl.handle.net/10362/163569 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138174345150464 |