Monitoring the impact of data and model quality in machine learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/42953 |
Resumo: | Considering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production. |
id |
RCAP_1e526c345d7df179c5267f51c9f6e881 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/42953 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Monitoring the impact of data and model quality in machine learningDetectionDriftMachine learningMLOpsMonitorPerformancePipelineSystemToolsConsidering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production.Considerando a evolução que os algoritmos de Aprendizagem computacional têm tido e o seu uso no dia-a-dia de organizações, tornou-se uma necessidade monitorizar e avaliar a sua execução quando em ambientes de produção. É neste sentido que surge esta dissertação, com o objetivo de contribuir para a base de conhecimento existente, oferecendo uma perspetiva focada na monitorização de modelos de aprendizagem automática durante a sua fase operacional, a abordagem de pesquisa envolve uma exploração teórica seguida pela simulação de vários erros que causam a degradação de modelos em produção. Desta forma, identificamos diversos fatores que podem passar despercebidos quando os modelos estão em produção, como o enviesamento dos modelos (model bias), a deriva de dados (data drift), deriva de conceito (concept drift), entre outros, e demonstramos maneiras de os detetar. Concluímos que é imperativo ter processos em prática para a monitorização de dados e modelos em produção, bem como trazer à luz o Machine Learning Operations (MLOps) como uma solução para agilizar a implementação, monitorização e manutenção de um modelo em produção.2024-11-28T15:35:42Z2024-07-11T00:00:00Z2024-07-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/42953engSonga, Amós Kelvin Joséinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-12-02T01:48:03Zoai:ria.ua.pt:10773/42953Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-12-02T01:48:03Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Monitoring the impact of data and model quality in machine learning |
title |
Monitoring the impact of data and model quality in machine learning |
spellingShingle |
Monitoring the impact of data and model quality in machine learning Songa, Amós Kelvin José Detection Drift Machine learning MLOps Monitor Performance Pipeline System Tools |
title_short |
Monitoring the impact of data and model quality in machine learning |
title_full |
Monitoring the impact of data and model quality in machine learning |
title_fullStr |
Monitoring the impact of data and model quality in machine learning |
title_full_unstemmed |
Monitoring the impact of data and model quality in machine learning |
title_sort |
Monitoring the impact of data and model quality in machine learning |
author |
Songa, Amós Kelvin José |
author_facet |
Songa, Amós Kelvin José |
author_role |
author |
dc.contributor.author.fl_str_mv |
Songa, Amós Kelvin José |
dc.subject.por.fl_str_mv |
Detection Drift Machine learning MLOps Monitor Performance Pipeline System Tools |
topic |
Detection Drift Machine learning MLOps Monitor Performance Pipeline System Tools |
description |
Considering the evolution of machine learning algorithms and their use in the dayto- day operations of organizations, it has become necessary to monitor and evaluate their performance in production environments. This dissertation aims to contribute to the existing body of knowledge by offering a perspective focused on monitoring machine learning models during their operational phase. The research approach involves a theoretical exploration followed by the simulation of various errors that cause model degradation in production. In this way, we identify several factors that may go unnoticed when models are in production, such as model bias, data drift, concept drift, and others, and we demonstrate ways to detect them. We conclude that it is imperative to have processes in place for monitoring data and models in production, as well as to highlight Machine Learning Operations (MLOps) as a solution to streamline the deployment, monitoring, and maintenance of a model in production. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-11-28T15:35:42Z 2024-07-11T00:00:00Z 2024-07-11 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/42953 |
url |
http://hdl.handle.net/10773/42953 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817549887386943488 |