Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data

Detalhes bibliográficos
Autor(a) principal: Nascimento, Moysés
Data de Publicação: 2017
Outros Autores: Silva, Fabyano Fonseca e, Sáfadi, Thelma, Nascimento, Ana Carolina Campana, Ferreira, Talles Eduardo Maciel, Barroso, Laís Mayara Azevedo, Azevedo, Camila Ferreira, Guimarães, Simone Eliza Faccione, Serão, Nick Vergara Lopes
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFLA
Texto Completo: http://repositorio.ufla.br/jspui/handle/1/36752
Resumo: Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.
id UFLA_e18ec00d45df3fcbda42430f632d92f6
oai_identifier_str oai:localhost:1/36752
network_acronym_str UFLA
network_name_str Repositório Institucional da UFLA
repository_id_str
spelling Independent Component Analysis (ICA) based-clustering of temporal RNA-seq dataGene expressionSimulationModelingClustering algorithmsStatistical dataRNA sequencingPrincipal component analysisRNA synthesisSwineIndependent component analysisGene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.PLOS2019-09-09T19:11:19Z2019-09-09T19:11:19Z2017-07-17info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfNASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195.http://repositorio.ufla.br/jspui/handle/1/36752PLoS Onereponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessNascimento, MoysésSilva, Fabyano Fonseca eSáfadi, ThelmaNascimento, Ana Carolina CampanaFerreira, Talles Eduardo MacielBarroso, Laís Mayara AzevedoAzevedo, Camila FerreiraGuimarães, Simone Eliza FaccioneSerão, Nick Vergara Lopeseng2023-05-19T18:59:53Zoai:localhost:1/36752Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2023-05-19T18:59:53Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
spellingShingle Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
Nascimento, Moysés
Gene expression
Simulation
Modeling
Clustering algorithms
Statistical data
RNA sequencing
Principal component analysis
RNA synthesis
Swine
Independent component analysis
title_short Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_full Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_fullStr Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_full_unstemmed Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_sort Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
author Nascimento, Moysés
author_facet Nascimento, Moysés
Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Azevedo, Camila Ferreira
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
author_role author
author2 Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Azevedo, Camila Ferreira
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
author2_role author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Nascimento, Moysés
Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Azevedo, Camila Ferreira
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
dc.subject.por.fl_str_mv Gene expression
Simulation
Modeling
Clustering algorithms
Statistical data
RNA sequencing
Principal component analysis
RNA synthesis
Swine
Independent component analysis
topic Gene expression
Simulation
Modeling
Clustering algorithms
Statistical data
RNA sequencing
Principal component analysis
RNA synthesis
Swine
Independent component analysis
description Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.
publishDate 2017
dc.date.none.fl_str_mv 2017-07-17
2019-09-09T19:11:19Z
2019-09-09T19:11:19Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv NASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195.
http://repositorio.ufla.br/jspui/handle/1/36752
identifier_str_mv NASCIMENTO, M. et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One, [S.l.], v. 12, n. 7, 2017. DOI: 10.1371/journal.pone.0181195.
url http://repositorio.ufla.br/jspui/handle/1/36752
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv PLOS
publisher.none.fl_str_mv PLOS
dc.source.none.fl_str_mv PLoS One
reponame:Repositório Institucional da UFLA
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str Repositório Institucional da UFLA
collection Repositório Institucional da UFLA
repository.name.fl_str_mv Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv nivaldo@ufla.br || repositorio.biblioteca@ufla.br
_version_ 1784550117988106240