Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFLA |
Texto Completo: | http://repositorio.ufla.br/jspui/handle/1/36752 |
Resumo: | Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together. |
id |
UFLA_e18ec00d45df3fcbda42430f632d92f6 |
---|---|
oai_identifier_str |
oai:localhost:1/36752 |
network_acronym_str |
UFLA |
network_name_str |
Repositório Institucional da UFLA |
repository_id_str |
|
spelling |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq dataGene expressionSimulationModelingClustering algorithmsStatistical dataRNA sequencingPrincipal component analysisRNA synthesisSwineIndependent component analysisGene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.PLOS2019-09-09T19:11:19Z2019-09-09T19:11:19Z2017-07-17info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfNASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195.http://repositorio.ufla.br/jspui/handle/1/36752PLoS Onereponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessNascimento, MoysésSilva, Fabyano Fonseca eSáfadi, ThelmaNascimento, Ana Carolina CampanaFerreira, Talles Eduardo MacielBarroso, Laís Mayara AzevedoAzevedo, Camila FerreiraGuimarães, Simone Eliza FaccioneSerão, Nick Vergara Lopeseng2023-05-19T18:59:53Zoai:localhost:1/36752Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-05-19T18:59:53Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false |
dc.title.none.fl_str_mv |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
title |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
spellingShingle |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data Nascimento, Moysés Gene expression Simulation Modeling Clustering algorithms Statistical data RNA sequencing Principal component analysis RNA synthesis Swine Independent component analysis |
title_short |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
title_full |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
title_fullStr |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
title_full_unstemmed |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
title_sort |
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data |
author |
Nascimento, Moysés |
author_facet |
Nascimento, Moysés Silva, Fabyano Fonseca e Sáfadi, Thelma Nascimento, Ana Carolina Campana Ferreira, Talles Eduardo Maciel Barroso, Laís Mayara Azevedo Azevedo, Camila Ferreira Guimarães, Simone Eliza Faccione Serão, Nick Vergara Lopes |
author_role |
author |
author2 |
Silva, Fabyano Fonseca e Sáfadi, Thelma Nascimento, Ana Carolina Campana Ferreira, Talles Eduardo Maciel Barroso, Laís Mayara Azevedo Azevedo, Camila Ferreira Guimarães, Simone Eliza Faccione Serão, Nick Vergara Lopes |
author2_role |
author author author author author author author author |
dc.contributor.author.fl_str_mv |
Nascimento, Moysés Silva, Fabyano Fonseca e Sáfadi, Thelma Nascimento, Ana Carolina Campana Ferreira, Talles Eduardo Maciel Barroso, Laís Mayara Azevedo Azevedo, Camila Ferreira Guimarães, Simone Eliza Faccione Serão, Nick Vergara Lopes |
dc.subject.por.fl_str_mv |
Gene expression Simulation Modeling Clustering algorithms Statistical data RNA sequencing Principal component analysis RNA synthesis Swine Independent component analysis |
topic |
Gene expression Simulation Modeling Clustering algorithms Statistical data RNA sequencing Principal component analysis RNA synthesis Swine Independent component analysis |
description |
Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-07-17 2019-09-09T19:11:19Z 2019-09-09T19:11:19Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
NASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195. http://repositorio.ufla.br/jspui/handle/1/36752 |
identifier_str_mv |
NASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195. |
url |
http://repositorio.ufla.br/jspui/handle/1/36752 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
PLOS |
publisher.none.fl_str_mv |
PLOS |
dc.source.none.fl_str_mv |
PLoS One reponame:Repositório Institucional da UFLA instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
Repositório Institucional da UFLA |
collection |
Repositório Institucional da UFLA |
repository.name.fl_str_mv |
Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
nivaldo@ufla.br || repositorio.biblioteca@ufla.br |
_version_ |
1784550117988106240 |