Predicting model training time to optimize distributed machine learning applications
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/85498 |
Resumo: | Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data. |
id |
RCAP_7d84f981c877333a3f81a7b1cd485e51 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/85498 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Predicting model training time to optimize distributed machine learning applicationsMeta-learningMachine learningDistributed learningTraining timeOptimizationScience & TechnologyDespite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021, and CPCA-IAC/AV/475278/2022.Multidisciplinary Digital Publishing InstituteUniversidade do MinhoGuimarães, MiguelCarneiro, DavidePalumbo, GuilhermeOliveira, FilipeOliveira, ÓscarAlves, VictorNovais, Paulo2023-02-082023-02-08T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/85498engGuimarães, M.; Carneiro, D.; Palumbo, G.; Oliveira, F.; Oliveira, Ó.; Alves, V.; Novais, P. Predicting Model Training Time to Optimize Distributed Machine Learning Applications. Electronics 2023, 12, 871. https://doi.org/10.3390/electronics120408712079-929210.3390/electronics12040871https://www.mdpi.com/2079-9292/12/4/871info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-23T01:36:26Zoai:repositorium.sdum.uminho.pt:1822/85498Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:47:10.571155Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Predicting model training time to optimize distributed machine learning applications |
title |
Predicting model training time to optimize distributed machine learning applications |
spellingShingle |
Predicting model training time to optimize distributed machine learning applications Guimarães, Miguel Meta-learning Machine learning Distributed learning Training time Optimization Science & Technology |
title_short |
Predicting model training time to optimize distributed machine learning applications |
title_full |
Predicting model training time to optimize distributed machine learning applications |
title_fullStr |
Predicting model training time to optimize distributed machine learning applications |
title_full_unstemmed |
Predicting model training time to optimize distributed machine learning applications |
title_sort |
Predicting model training time to optimize distributed machine learning applications |
author |
Guimarães, Miguel |
author_facet |
Guimarães, Miguel Carneiro, Davide Palumbo, Guilherme Oliveira, Filipe Oliveira, Óscar Alves, Victor Novais, Paulo |
author_role |
author |
author2 |
Carneiro, Davide Palumbo, Guilherme Oliveira, Filipe Oliveira, Óscar Alves, Victor Novais, Paulo |
author2_role |
author author author author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Guimarães, Miguel Carneiro, Davide Palumbo, Guilherme Oliveira, Filipe Oliveira, Óscar Alves, Victor Novais, Paulo |
dc.subject.por.fl_str_mv |
Meta-learning Machine learning Distributed learning Training time Optimization Science & Technology |
topic |
Meta-learning Machine learning Distributed learning Training time Optimization Science & Technology |
description |
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-02-08 2023-02-08T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/85498 |
url |
https://hdl.handle.net/1822/85498 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Guimarães, M.; Carneiro, D.; Palumbo, G.; Oliveira, F.; Oliveira, Ó.; Alves, V.; Novais, P. Predicting Model Training Time to Optimize Distributed Machine Learning Applications. Electronics 2023, 12, 871. https://doi.org/10.3390/electronics12040871 2079-9292 10.3390/electronics12040871 https://www.mdpi.com/2079-9292/12/4/871 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute |
publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133044560363520 |