nodeML - Towards reproducible ML in federated environments

Detalhes bibliográficos
Autor(a) principal: Silva, Edgar Simão da Mota e
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.22/21438
Resumo: Advances and increasing interest in AI (Artificial Intelligence) in the field of health have created novel issues, namely explainability and reproducibility of ML (Machine Learning) models. In addition, while the training of ML models traditionally favors a centralized approach, scalability and privacy issues seem to lead towards a distributed one. The latter poses challenges to ML algorithms and the efficacy of learning itself. Reproducing ML models poses several challenges arising from the intrinsic variability of the models themselves and the environment where they are trained. This problem is aggravated by their lack of standardization and common terminology. The main goal of this work is to conceptualize and prototype a framework to train, evaluate and describe ML models, in a decentralized way, over immunogenetics datasets. This framework will promote model reproducibility and comparability, as well as its adaptability. This work will start by implementing a federated/decentralized training framework over existing ML pipelines. Then, it will be possible to list and select potential dataset sources, aiming to provide an easy path to model adaptation and optimization.
id RCAP_34ecd3c9586ae601bf0cc53236af5c15
oai_identifier_str oai:recipp.ipp.pt:10400.22/21438
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling nodeML - Towards reproducible ML in federated environmentsFederated learningDecentralizationMachine LearningImmunologyImmunotherapyGeneticsAdvances and increasing interest in AI (Artificial Intelligence) in the field of health have created novel issues, namely explainability and reproducibility of ML (Machine Learning) models. In addition, while the training of ML models traditionally favors a centralized approach, scalability and privacy issues seem to lead towards a distributed one. The latter poses challenges to ML algorithms and the efficacy of learning itself. Reproducing ML models poses several challenges arising from the intrinsic variability of the models themselves and the environment where they are trained. This problem is aggravated by their lack of standardization and common terminology. The main goal of this work is to conceptualize and prototype a framework to train, evaluate and describe ML models, in a decentralized way, over immunogenetics datasets. This framework will promote model reproducibility and comparability, as well as its adaptability. This work will start by implementing a federated/decentralized training framework over existing ML pipelines. Then, it will be possible to list and select potential dataset sources, aiming to provide an easy path to model adaptation and optimization.Os contínuos avanços e crescente interesse em IA (Inteligência Artificial) no campo da saúde levantaram novas questões, nomeadamente a explicabilidade e a reprodutibilidade de modelos de ML (Machine Learning). Adicionalmente, enquanto o treino de modelos de ML favorece tradicionalmente uma abordagem centralizada, questões de escalabilidade e privacidade tendem a levar a uma abordagem distribuída. Esta última apresenta desafios aos algoritmos de ML e à eficácia do treino em si. A reprodução de modelos de ML apresenta vários desafios decorrentes da variabilidade intrínseca dos próprios modelos e do ambiente onde são treinados. Este problema é agravado pela falta de padronização e terminologia comum. O principal objetivo deste trabalho é conceptualizar e prototipar uma framework para treinar, avaliar e descrever modelos de ML, de forma descentralizada, sobre conjuntos de dados imunogenéticos. Essa framework promoverá a reproducibilidade e comparabilidade dos modelos, bem como a sua adaptabilidade. Este trabalho começará com a implementação de uma framework de treino federado/descentralizado sobre pipelines de ML existentes. De seguida, será possível listar e selecionar potenciais fontes de dados, esperando facilitar a adaptação e otimização dos modelos.Faria, Luiz Felipe Rocha deRepositório Científico do Instituto Politécnico do PortoSilva, Edgar Simão da Mota e2023-01-11T15:03:45Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/21438TID:203112628enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:17:17Zoai:recipp.ipp.pt:10400.22/21438Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:41:30.202705Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv nodeML - Towards reproducible ML in federated environments
title nodeML - Towards reproducible ML in federated environments
spellingShingle nodeML - Towards reproducible ML in federated environments
Silva, Edgar Simão da Mota e
Federated learning
Decentralization
Machine Learning
Immunology
Immunotherapy
Genetics
title_short nodeML - Towards reproducible ML in federated environments
title_full nodeML - Towards reproducible ML in federated environments
title_fullStr nodeML - Towards reproducible ML in federated environments
title_full_unstemmed nodeML - Towards reproducible ML in federated environments
title_sort nodeML - Towards reproducible ML in federated environments
author Silva, Edgar Simão da Mota e
author_facet Silva, Edgar Simão da Mota e
author_role author
dc.contributor.none.fl_str_mv Faria, Luiz Felipe Rocha de
Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv Silva, Edgar Simão da Mota e
dc.subject.por.fl_str_mv Federated learning
Decentralization
Machine Learning
Immunology
Immunotherapy
Genetics
topic Federated learning
Decentralization
Machine Learning
Immunology
Immunotherapy
Genetics
description Advances and increasing interest in AI (Artificial Intelligence) in the field of health have created novel issues, namely explainability and reproducibility of ML (Machine Learning) models. In addition, while the training of ML models traditionally favors a centralized approach, scalability and privacy issues seem to lead towards a distributed one. The latter poses challenges to ML algorithms and the efficacy of learning itself. Reproducing ML models poses several challenges arising from the intrinsic variability of the models themselves and the environment where they are trained. This problem is aggravated by their lack of standardization and common terminology. The main goal of this work is to conceptualize and prototype a framework to train, evaluate and describe ML models, in a decentralized way, over immunogenetics datasets. This framework will promote model reproducibility and comparability, as well as its adaptability. This work will start by implementing a federated/decentralized training framework over existing ML pipelines. Then, it will be possible to list and select potential dataset sources, aiming to provide an easy path to model adaptation and optimization.
publishDate 2022
dc.date.none.fl_str_mv 2022
2022-01-01T00:00:00Z
2023-01-11T15:03:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/21438
TID:203112628
url http://hdl.handle.net/10400.22/21438
identifier_str_mv TID:203112628
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131502922956800