Tackling Version Management and Reproducibility in MLOps
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/152181 |
Resumo: | The growing adoption of machine learning solutions requires advancements in applying best practices to maintain artificial intelligence systems in production. Machine Learning Operations (MLOps) incorporates DevOps principles into machine learning development, promoting automation, continuous delivery, monitoring, and training capabilities. Due to multiple factors, such as the experimental nature of the machine learning process or the need for model optimizations derived from changes in business needs, data scientists are expected to create multiple experiments to develop a model or predictor that satisfactorily addresses the main challenges of a given problem. Since the re-evaluation of models is a constant need, metadata is constantly produced due to multiple experiment runs. This metadata is known as ML artifacts or assets. The proper lineage between these artifacts enables environment recreation, facilitating model reproducibility. Linking information from experiments, models, datasets, configurations, and code changes requires proper organization, tracking, maintenance, and version control of these artifacts. This work will investigate the best practices, current issues, and open challenges related to artifact versioning and management and apply this knowledge to develop an ML workflow that supports ML engineering and operationalization, applying MLOps principles that facilitate model reproducibility. Scenarios covering data preparation, model generation, comparison between model versions, deployment, monitoring, debugging, and retraining demonstrated how the selected frameworks and tools could be integrated to achieve that goal. |
id |
RCAP_46069993f5cc6e413471b65d407ed7b5 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/152181 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Tackling Version Management and Reproducibility in MLOpsOutras ciências da engenharia e tecnologiasOther engineering and technologiesThe growing adoption of machine learning solutions requires advancements in applying best practices to maintain artificial intelligence systems in production. Machine Learning Operations (MLOps) incorporates DevOps principles into machine learning development, promoting automation, continuous delivery, monitoring, and training capabilities. Due to multiple factors, such as the experimental nature of the machine learning process or the need for model optimizations derived from changes in business needs, data scientists are expected to create multiple experiments to develop a model or predictor that satisfactorily addresses the main challenges of a given problem. Since the re-evaluation of models is a constant need, metadata is constantly produced due to multiple experiment runs. This metadata is known as ML artifacts or assets. The proper lineage between these artifacts enables environment recreation, facilitating model reproducibility. Linking information from experiments, models, datasets, configurations, and code changes requires proper organization, tracking, maintenance, and version control of these artifacts. This work will investigate the best practices, current issues, and open challenges related to artifact versioning and management and apply this knowledge to develop an ML workflow that supports ML engineering and operationalization, applying MLOps principles that facilitate model reproducibility. Scenarios covering data preparation, model generation, comparison between model versions, deployment, monitoring, debugging, and retraining demonstrated how the selected frameworks and tools could be integrated to achieve that goal.2023-07-202023-07-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/152181TID:203420853engPriscilla Dias Melininfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-22T01:27:30Zoai:repositorio-aberto.up.pt:10216/152181Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:33:36.055508Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Tackling Version Management and Reproducibility in MLOps |
title |
Tackling Version Management and Reproducibility in MLOps |
spellingShingle |
Tackling Version Management and Reproducibility in MLOps Priscilla Dias Melin Outras ciências da engenharia e tecnologias Other engineering and technologies |
title_short |
Tackling Version Management and Reproducibility in MLOps |
title_full |
Tackling Version Management and Reproducibility in MLOps |
title_fullStr |
Tackling Version Management and Reproducibility in MLOps |
title_full_unstemmed |
Tackling Version Management and Reproducibility in MLOps |
title_sort |
Tackling Version Management and Reproducibility in MLOps |
author |
Priscilla Dias Melin |
author_facet |
Priscilla Dias Melin |
author_role |
author |
dc.contributor.author.fl_str_mv |
Priscilla Dias Melin |
dc.subject.por.fl_str_mv |
Outras ciências da engenharia e tecnologias Other engineering and technologies |
topic |
Outras ciências da engenharia e tecnologias Other engineering and technologies |
description |
The growing adoption of machine learning solutions requires advancements in applying best practices to maintain artificial intelligence systems in production. Machine Learning Operations (MLOps) incorporates DevOps principles into machine learning development, promoting automation, continuous delivery, monitoring, and training capabilities. Due to multiple factors, such as the experimental nature of the machine learning process or the need for model optimizations derived from changes in business needs, data scientists are expected to create multiple experiments to develop a model or predictor that satisfactorily addresses the main challenges of a given problem. Since the re-evaluation of models is a constant need, metadata is constantly produced due to multiple experiment runs. This metadata is known as ML artifacts or assets. The proper lineage between these artifacts enables environment recreation, facilitating model reproducibility. Linking information from experiments, models, datasets, configurations, and code changes requires proper organization, tracking, maintenance, and version control of these artifacts. This work will investigate the best practices, current issues, and open challenges related to artifact versioning and management and apply this knowledge to develop an ML workflow that supports ML engineering and operationalization, applying MLOps principles that facilitate model reproducibility. Scenarios covering data preparation, model generation, comparison between model versions, deployment, monitoring, debugging, and retraining demonstrated how the selected frameworks and tools could be integrated to achieve that goal. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-07-20 2023-07-20T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/152181 TID:203420853 |
url |
https://hdl.handle.net/10216/152181 |
identifier_str_mv |
TID:203420853 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135647197298688 |