Data and computer center prediction of usage and cost: an interpretable machine learning approach
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/160465 |
Resumo: | In recent years, Cloud computing usage has considerably increased and, nowadays, it is the backbone of many emerging applications. However, behind cloud structures, we have physical infrastructures (data centers) for which managing is difficult due to un- predictable utilization patterns. To address the constraints of reactive auto-scaling, data centers are widely adopting predictive cloud resource management mechanisms. How- ever, predictive methods rely on application workloads and are typically pre-optimized for specific patterns, which can cause under/over-provisioning of resources. Accurate workload forecasts are necessary to gain efficiency, save money, and provide clients with better and faster services. Working with real data from a Portuguese bank, we propose Ensemble Adaptive Model with Drift detector (EAMDrift). This novel method combines forecasts from multi- ple individual predictors by giving weights to each individual model prediction according to a performance metric. EAMDrift automatically retrains when needed and identifies the most appropriate models to use at each moment through interpretable mechanisms. We tested our novel methodology in a real data problem, by studying the influence of external signals (mass and social media) on data center workloads. As we are working with real data from a bank, we hypothesize that users can increase or decrease the usage of some applications depending on external factors such as controversies or news about economics. For this study, EAMDrift was projected to allow multiple past covariates. We evaluated EAMDrift in different workloads and compared the results with sev- eral baseline methods models. The experimental evaluation shows that EAMDrift out- performs individual baseline models in 15% to 25%. Compared to the best black-box ensemble model, our model has a comparable error (increased in 1-3%). Thus, this work suggests that interpretable models are a viable solution for data center workload predic- tion. |
id |
RCAP_56f17eaee71dcb1bcf95a548742e9d46 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/160465 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Data and computer center prediction of usage and cost: an interpretable machine learning approachData center managementInterpretable machine learningDynamic prediction modelNatural language processingFeature extractionDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaIn recent years, Cloud computing usage has considerably increased and, nowadays, it is the backbone of many emerging applications. However, behind cloud structures, we have physical infrastructures (data centers) for which managing is difficult due to un- predictable utilization patterns. To address the constraints of reactive auto-scaling, data centers are widely adopting predictive cloud resource management mechanisms. How- ever, predictive methods rely on application workloads and are typically pre-optimized for specific patterns, which can cause under/over-provisioning of resources. Accurate workload forecasts are necessary to gain efficiency, save money, and provide clients with better and faster services. Working with real data from a Portuguese bank, we propose Ensemble Adaptive Model with Drift detector (EAMDrift). This novel method combines forecasts from multi- ple individual predictors by giving weights to each individual model prediction according to a performance metric. EAMDrift automatically retrains when needed and identifies the most appropriate models to use at each moment through interpretable mechanisms. We tested our novel methodology in a real data problem, by studying the influence of external signals (mass and social media) on data center workloads. As we are working with real data from a bank, we hypothesize that users can increase or decrease the usage of some applications depending on external factors such as controversies or news about economics. For this study, EAMDrift was projected to allow multiple past covariates. We evaluated EAMDrift in different workloads and compared the results with sev- eral baseline methods models. The experimental evaluation shows that EAMDrift out- performs individual baseline models in 15% to 25%. Compared to the best black-box ensemble model, our model has a comparable error (increased in 1-3%). Thus, this work suggests that interpretable models are a viable solution for data center workload predic- tion.Nos últimos anos, a computação em nuvem tem tido um aumento considerável e, hoje pode ser vista como a espinha dorsal de muitas aplicações que estão a emergir. Contudo, por detrás das conhecidas nuvens, existem estruturas físicas (centro de dados) nas quais, a gestão tem se revelado uma tarefa bastante difícil devido à imprevisibilidade de utilização dos serviços. Para lidar com as restrições do auto-scalling reativo, os mecanismos de gestão dos centros de dados começaram a adotar algoritmos preditivos. No entanto, os algoritmos preditivos são treinados com base nas cargas de utilização das aplicações e geralmente não estão otimizados para todos os padrões, causando sub/sobre provisionamento dos recursos. Através da utilização de dados reais do centro de dados de um banco português, pro- pomos o Ensemble Adaptive Model with Drift detector (EAMDrift). Este novo método combina previsões de vários modelos individuais através de uma métrica de desempe- nho. O EAMDrift possui mecanismos interpretáveis que permitem detetar os melhores modelos em cada previsão, bem como detetar momentos para ser retreinado. A nossa metodologia foi testada num problema com dados reais, e foi estudada a influência de fatores externos (notícias relacionadas com o banco) com a sua utilização. Sendo estes dados de um banco, é possível que os utilizadores aumentem ou diminuam o uso de algumas aplicações com base em fatores externos (polêmicas ou notícias sobre economia). Para isto, o EAMDrift permite o uso de outras variáveis (covariadas). O modelo proposto neste trabalho foi avaliado em diferentes conjuntos de dados e os resultados foram comparados entre vários modelos de base. O EAMDrift superou todos os modelos de base em 15% a 25%. Quando comparado com o melhor modelo que também combina várias previsões mas de forma não interpretável, o nosso modelo obteve um erro comparável (maior em 1 a 3%). Assim, este trabalho sugere que modelos interpretáveis podem ser uma solução viável para a gestão dos centros de dados.Soares, CláudiaLeitão, JoãoRodrigues, AntónioRUNMateus, Gonçalo Furtado2023-11-24T19:04:05Z2023-052023-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/160465enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:43:12Zoai:run.unl.pt:10362/160465Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:58:04.163724Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
title |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
spellingShingle |
Data and computer center prediction of usage and cost: an interpretable machine learning approach Mateus, Gonçalo Furtado Data center management Interpretable machine learning Dynamic prediction model Natural language processing Feature extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
title_full |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
title_fullStr |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
title_full_unstemmed |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
title_sort |
Data and computer center prediction of usage and cost: an interpretable machine learning approach |
author |
Mateus, Gonçalo Furtado |
author_facet |
Mateus, Gonçalo Furtado |
author_role |
author |
dc.contributor.none.fl_str_mv |
Soares, Cláudia Leitão, João Rodrigues, António RUN |
dc.contributor.author.fl_str_mv |
Mateus, Gonçalo Furtado |
dc.subject.por.fl_str_mv |
Data center management Interpretable machine learning Dynamic prediction model Natural language processing Feature extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Data center management Interpretable machine learning Dynamic prediction model Natural language processing Feature extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
In recent years, Cloud computing usage has considerably increased and, nowadays, it is the backbone of many emerging applications. However, behind cloud structures, we have physical infrastructures (data centers) for which managing is difficult due to un- predictable utilization patterns. To address the constraints of reactive auto-scaling, data centers are widely adopting predictive cloud resource management mechanisms. How- ever, predictive methods rely on application workloads and are typically pre-optimized for specific patterns, which can cause under/over-provisioning of resources. Accurate workload forecasts are necessary to gain efficiency, save money, and provide clients with better and faster services. Working with real data from a Portuguese bank, we propose Ensemble Adaptive Model with Drift detector (EAMDrift). This novel method combines forecasts from multi- ple individual predictors by giving weights to each individual model prediction according to a performance metric. EAMDrift automatically retrains when needed and identifies the most appropriate models to use at each moment through interpretable mechanisms. We tested our novel methodology in a real data problem, by studying the influence of external signals (mass and social media) on data center workloads. As we are working with real data from a bank, we hypothesize that users can increase or decrease the usage of some applications depending on external factors such as controversies or news about economics. For this study, EAMDrift was projected to allow multiple past covariates. We evaluated EAMDrift in different workloads and compared the results with sev- eral baseline methods models. The experimental evaluation shows that EAMDrift out- performs individual baseline models in 15% to 25%. Compared to the best black-box ensemble model, our model has a comparable error (increased in 1-3%). Thus, this work suggests that interpretable models are a viable solution for data center workload predic- tion. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-11-24T19:04:05Z 2023-05 2023-05-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/160465 |
url |
http://hdl.handle.net/10362/160465 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138162349441024 |