Integration of heterogeneous data: a multi-omics application
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/45/45133/tde-01092020-164939/ |
Resumo: | Nowadays, a huge amount of data has being collected in different research areas, such as public health, agriculture, marketing, so high-dimension databases are becoming very common to encounter. More specifically, with the advance of technology many biological information are now available at low costs -- data from genome, miRNA (MicroRNA), mRNA (messenger RNA), gene expression, protein, methylation, lipids, metabolism, phenotypes and so on. Several different studies have been done individually with each type of data, but more recently there is an increasingly interest in integrating different data to gather more information. However, many classical methodologies used to this end assume the data matrix to be completed and numerical. Therefore, the heterogeneity of dataset with different variable types is not considered. Alternatively, the Generalized Low Rank Models (GLRM) is a tool capable of dealing with large datasets of heterogeneous data. Although its use is destined for a single database, this projects shows that it is flexible enough to handle abstract data, from different sources, by using different loss functions, adequate to each variable type. GLRM is a very powerful method that can deal with problems from different natures, but it is very recent, so its potential to work with multi-omics is still being discovered. In this context, the present work introduces GLRM and explores its possibilities for dimensionality reduction on supervised and unsupervised analysis using simulated and real multi-omics datasets. |
id |
USP_2f5ccc1fc448c33d9a4a8cd386f3baa1 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-01092020-164939 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Integration of heterogeneous data: a multi-omics applicationIntegração de dados heterogêneos: uma aplicação em dados multi-ômicosAnálise multivariadaDados multi-ômicosFatorização de matrizesGeneralized low rank modelsGeneralized low rank modelsMatrix factorizationMulti-omicsMultivariate analysisNowadays, a huge amount of data has being collected in different research areas, such as public health, agriculture, marketing, so high-dimension databases are becoming very common to encounter. More specifically, with the advance of technology many biological information are now available at low costs -- data from genome, miRNA (MicroRNA), mRNA (messenger RNA), gene expression, protein, methylation, lipids, metabolism, phenotypes and so on. Several different studies have been done individually with each type of data, but more recently there is an increasingly interest in integrating different data to gather more information. However, many classical methodologies used to this end assume the data matrix to be completed and numerical. Therefore, the heterogeneity of dataset with different variable types is not considered. Alternatively, the Generalized Low Rank Models (GLRM) is a tool capable of dealing with large datasets of heterogeneous data. Although its use is destined for a single database, this projects shows that it is flexible enough to handle abstract data, from different sources, by using different loss functions, adequate to each variable type. GLRM is a very powerful method that can deal with problems from different natures, but it is very recent, so its potential to work with multi-omics is still being discovered. In this context, the present work introduces GLRM and explores its possibilities for dimensionality reduction on supervised and unsupervised analysis using simulated and real multi-omics datasets.Atualmente, uma enorme quantidade de dados tem sido coletada em diversas áreas do conhecimento, como saúde, agropecuária, marketing, fazendo com que dados de alta dimensão se tornem cada vez mais comuns. Mais especificamente, com os avanços da tecnologia muitas informações biológicas estão disponíveis por preços acessíveis como dados do genoma, miRNA (micro RNA), mRNA (RNA mensageiro), expressão gênica e proteica, metilação, lipídeos, metabólicos e de fenótipos, por exemplo. Diversos estudos têm sido feitos para análise de cada tipo de dados individualmente, entretanto, recentemente vem se tornando interessante integrar diferentes tipos de dados para obter mais informação. Porém, muitas das metodologias clássicas utilizadas com esse objetivo assumem que a matriz de dados é completa e numérica. Portanto, a heterogeneidade de dados com variáveis de diversos tipos não está sendo considerada. Alternativamente, os Generalized Low Rank Models (GLRM) são modelos capazes de lidar com grandes bancos de dados com variáveis heterogêneas. Apesar desse método ser destinado para um único banco de dados, mostramos neste trabalho que ele é flexível o bastante para lidar com dados abstratos, de diferentes fontes, ao atribuir funções perdas diferentes, adequadas para cada tipo de variável. Com isso, o GLRM é uma ferramenta para trabalhar com problemas de diversas naturezas, mas, por ser muito recente, seu potencial para trabalhar com dados multi-ômicos ainda está sendo descoberto. Neste contexto, no presente trabalho O GRLM é introduzido e são exploradas diferentes possibilidades de usar o GLRM para redução de dimensionalidade e integração de bancos de dados em análises supervisionadas e não supervisionadas utilizando dados multi-ômicos simulados e reais.Biblioteca Digitais de Teses e Dissertações da USPSoler, Julia Maria PavanVasconcelos, Ana Gabriela Pereira de2020-08-25info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45133/tde-01092020-164939/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-08-14T21:28:02Zoai:teses.usp.br:tde-01092020-164939Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-08-14T21:28:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Integration of heterogeneous data: a multi-omics application Integração de dados heterogêneos: uma aplicação em dados multi-ômicos |
title |
Integration of heterogeneous data: a multi-omics application |
spellingShingle |
Integration of heterogeneous data: a multi-omics application Vasconcelos, Ana Gabriela Pereira de Análise multivariada Dados multi-ômicos Fatorização de matrizes Generalized low rank models Generalized low rank models Matrix factorization Multi-omics Multivariate analysis |
title_short |
Integration of heterogeneous data: a multi-omics application |
title_full |
Integration of heterogeneous data: a multi-omics application |
title_fullStr |
Integration of heterogeneous data: a multi-omics application |
title_full_unstemmed |
Integration of heterogeneous data: a multi-omics application |
title_sort |
Integration of heterogeneous data: a multi-omics application |
author |
Vasconcelos, Ana Gabriela Pereira de |
author_facet |
Vasconcelos, Ana Gabriela Pereira de |
author_role |
author |
dc.contributor.none.fl_str_mv |
Soler, Julia Maria Pavan |
dc.contributor.author.fl_str_mv |
Vasconcelos, Ana Gabriela Pereira de |
dc.subject.por.fl_str_mv |
Análise multivariada Dados multi-ômicos Fatorização de matrizes Generalized low rank models Generalized low rank models Matrix factorization Multi-omics Multivariate analysis |
topic |
Análise multivariada Dados multi-ômicos Fatorização de matrizes Generalized low rank models Generalized low rank models Matrix factorization Multi-omics Multivariate analysis |
description |
Nowadays, a huge amount of data has being collected in different research areas, such as public health, agriculture, marketing, so high-dimension databases are becoming very common to encounter. More specifically, with the advance of technology many biological information are now available at low costs -- data from genome, miRNA (MicroRNA), mRNA (messenger RNA), gene expression, protein, methylation, lipids, metabolism, phenotypes and so on. Several different studies have been done individually with each type of data, but more recently there is an increasingly interest in integrating different data to gather more information. However, many classical methodologies used to this end assume the data matrix to be completed and numerical. Therefore, the heterogeneity of dataset with different variable types is not considered. Alternatively, the Generalized Low Rank Models (GLRM) is a tool capable of dealing with large datasets of heterogeneous data. Although its use is destined for a single database, this projects shows that it is flexible enough to handle abstract data, from different sources, by using different loss functions, adequate to each variable type. GLRM is a very powerful method that can deal with problems from different natures, but it is very recent, so its potential to work with multi-omics is still being discovered. In this context, the present work introduces GLRM and explores its possibilities for dimensionality reduction on supervised and unsupervised analysis using simulated and real multi-omics datasets. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-08-25 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/45/45133/tde-01092020-164939/ |
url |
https://www.teses.usp.br/teses/disponiveis/45/45133/tde-01092020-164939/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1815256586655891456 |