COLLAPSE: Collaborative full-stack platform for data science development
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/136622 |
Resumo: | Data science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project. |
id |
RCAP_c2298aea837a001f0646248516e94e34 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/136622 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
COLLAPSE: Collaborative full-stack platform for data science developmentEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringData science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project.2021-07-222021-07-22T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/136622TID:202821668engJoaquim Antero Pavão dos Santosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:46:44Zoai:repositorio-aberto.up.pt:10216/136622Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:31:55.432147Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
COLLAPSE: Collaborative full-stack platform for data science development |
title |
COLLAPSE: Collaborative full-stack platform for data science development |
spellingShingle |
COLLAPSE: Collaborative full-stack platform for data science development Joaquim Antero Pavão dos Santos Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
COLLAPSE: Collaborative full-stack platform for data science development |
title_full |
COLLAPSE: Collaborative full-stack platform for data science development |
title_fullStr |
COLLAPSE: Collaborative full-stack platform for data science development |
title_full_unstemmed |
COLLAPSE: Collaborative full-stack platform for data science development |
title_sort |
COLLAPSE: Collaborative full-stack platform for data science development |
author |
Joaquim Antero Pavão dos Santos |
author_facet |
Joaquim Antero Pavão dos Santos |
author_role |
author |
dc.contributor.author.fl_str_mv |
Joaquim Antero Pavão dos Santos |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
Data science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-07-22 2021-07-22T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/136622 TID:202821668 |
url |
https://hdl.handle.net/10216/136622 |
identifier_str_mv |
TID:202821668 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136228592844800 |