COLLAPSE: Collaborative full-stack platform for data science development

Detalhes bibliográficos
Autor(a) principal: Joaquim Antero Pavão dos Santos
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/136622
Resumo: Data science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project.
id RCAP_c2298aea837a001f0646248516e94e34
oai_identifier_str oai:repositorio-aberto.up.pt:10216/136622
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling COLLAPSE: Collaborative full-stack platform for data science developmentEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringData science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project.2021-07-222021-07-22T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/136622TID:202821668engJoaquim Antero Pavão dos Santosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:46:44Zoai:repositorio-aberto.up.pt:10216/136622Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:31:55.432147Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv COLLAPSE: Collaborative full-stack platform for data science development
title COLLAPSE: Collaborative full-stack platform for data science development
spellingShingle COLLAPSE: Collaborative full-stack platform for data science development
Joaquim Antero Pavão dos Santos
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short COLLAPSE: Collaborative full-stack platform for data science development
title_full COLLAPSE: Collaborative full-stack platform for data science development
title_fullStr COLLAPSE: Collaborative full-stack platform for data science development
title_full_unstemmed COLLAPSE: Collaborative full-stack platform for data science development
title_sort COLLAPSE: Collaborative full-stack platform for data science development
author Joaquim Antero Pavão dos Santos
author_facet Joaquim Antero Pavão dos Santos
author_role author
dc.contributor.author.fl_str_mv Joaquim Antero Pavão dos Santos
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description Data science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project.
publishDate 2021
dc.date.none.fl_str_mv 2021-07-22
2021-07-22T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/136622
TID:202821668
url https://hdl.handle.net/10216/136622
identifier_str_mv TID:202821668
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136228592844800