Monitoring the impact of data and model quality in machine learning

Detalhes bibliográficos
Autor(a) principal: Songa, Amós Kelvin José
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/42953
Resumo: Considering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production.
id RCAP_1e526c345d7df179c5267f51c9f6e881
oai_identifier_str oai:ria.ua.pt:10773/42953
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Monitoring the impact of data and model quality in machine learningDetectionDriftMachine learningMLOpsMonitorPerformancePipelineSystemToolsConsidering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production.Considerando a evolução que os algoritmos de Aprendizagem computacional têm tido e o seu uso no dia-a-dia de organizações, tornou-se uma necessidade monitorizar e avaliar a sua execução quando em ambientes de produção. É neste sentido que surge esta dissertação, com o objetivo de contribuir para a base de conhecimento existente, oferecendo uma perspetiva focada na monitorização de modelos de aprendizagem automática durante a sua fase operacional, a abordagem de pesquisa envolve uma exploração teórica seguida pela simulação de vários erros que causam a degradação de modelos em produção. Desta forma, identificamos diversos fatores que podem passar despercebidos quando os modelos estão em produção, como o enviesamento dos modelos (model bias), a deriva de dados (data drift), deriva de conceito (concept drift), entre outros, e demonstramos maneiras de os detetar. Concluímos que é imperativo ter processos em prática para a monitorização de dados e modelos em produção, bem como trazer à luz o Machine Learning Operations (MLOps) como uma solução para agilizar a implementação, monitorização e manutenção de um modelo em produção.2024-11-28T15:35:42Z2024-07-11T00:00:00Z2024-07-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/42953engSonga, Amós Kelvin Joséinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-12-02T01:48:03Zoai:ria.ua.pt:10773/42953Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-12-02T01:48:03Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Monitoring the impact of data and model quality in machine learning
title Monitoring the impact of data and model quality in machine learning
spellingShingle Monitoring the impact of data and model quality in machine learning
Songa, Amós Kelvin José
Detection
Drift
Machine learning
MLOps
Monitor
Performance
Pipeline
System
Tools
title_short Monitoring the impact of data and model quality in machine learning
title_full Monitoring the impact of data and model quality in machine learning
title_fullStr Monitoring the impact of data and model quality in machine learning
title_full_unstemmed Monitoring the impact of data and model quality in machine learning
title_sort Monitoring the impact of data and model quality in machine learning
author Songa, Amós Kelvin José
author_facet Songa, Amós Kelvin José
author_role author
dc.contributor.author.fl_str_mv Songa, Amós Kelvin José
dc.subject.por.fl_str_mv Detection
Drift
Machine learning
MLOps
Monitor
Performance
Pipeline
System
Tools
topic Detection
Drift
Machine learning
MLOps
Monitor
Performance
Pipeline
System
Tools
description Considering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production.
publishDate 2024
dc.date.none.fl_str_mv 2024-11-28T15:35:42Z
2024-07-11T00:00:00Z
2024-07-11
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/42953
url http://hdl.handle.net/10773/42953
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv mluisa.alvim@gmail.com
_version_ 1817549887386943488