SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis

Detalhes bibliográficos
Autor(a) principal: João Daniel Nunes Duarte
Data de Publicação: 2019
Outros Autores: Vinícius Diniz Mayrink
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: https://doi.org/10.18637/jss.v090.i09
http://hdl.handle.net/1843/56441
https://orcid.org/0000-0002-5683-8326
Resumo: The development of simulation-based methods, such as Markov chain Monte Carlo (MCMC), has contributed to an increased interest in the Bayesian framework as an alternative to deal with factor models. Many studies have used Bayesian factor analysis to explore gene expression data. We are particularly interested in the application of a sparse latent factor model (SLFM) based on sparsity priors (mixtures) to assess the significance of factors. The SLFM measures how strong the observed coherent expression pattern is in the data, which is an important source of information to evaluate gene activity. In the literature, this type of model has shown better results than other approaches intended for identification of patterns and metagene groups related to the underlying biology. However, a full Bayesian factor model relying on MCMC algorithms has an expensive computational cost, which makes it unattractive for general users. In this paper, we present the package slfm which uses C++ implementation via Rcpp to improve the computational performance of the SLFM within the widely used statistical tool R. We investigate real and simulated microarray data related to breast cancer.
id UFMG_2c05e22fb2d7458448aee8661d387a34
oai_identifier_str oai:repositorio.ufmg.br:1843/56441
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling 2023-07-17T18:52:39Z2023-07-17T18:52:39Z2019-07-31909https://doi.org/10.18637/jss.v090.i091548-7660http://hdl.handle.net/1843/56441https://orcid.org/0000-0002-5683-8326The development of simulation-based methods, such as Markov chain Monte Carlo (MCMC), has contributed to an increased interest in the Bayesian framework as an alternative to deal with factor models. Many studies have used Bayesian factor analysis to explore gene expression data. We are particularly interested in the application of a sparse latent factor model (SLFM) based on sparsity priors (mixtures) to assess the significance of factors. The SLFM measures how strong the observed coherent expression pattern is in the data, which is an important source of information to evaluate gene activity. In the literature, this type of model has shown better results than other approaches intended for identification of patterns and metagene groups related to the underlying biology. However, a full Bayesian factor model relying on MCMC algorithms has an expensive computational cost, which makes it unattractive for general users. In this paper, we present the package slfm which uses C++ implementation via Rcpp to improve the computational performance of the SLFM within the widely used statistical tool R. We investigate real and simulated microarray data related to breast cancer.O desenvolvimento de métodos baseados em simulação, como a cadeia de Markov Monte Carlo (MCMC), tem contribuído para um aumento do interesse no framework Bayesiano como uma alternativa para lidar com modelos fatoriais. Muitos estudos usaram análise fatorial bayesiana para explorar dados de expressão gênica. Estamos particularmente interessados ​​na aplicação de um modelo de fator latente esparso (SLFM) baseado em prioris de esparsidade (misturas) para avaliar a significância dos fatores. O SLFM mede a força do padrão de expressão coerente observado nos dados, o que é uma importante fonte de informação para avaliar a atividade do gene. Na literatura, esse tipo de modelo tem mostrado melhores resultados do que outras abordagens destinadas à identificação de padrões e grupos metagênicos relacionados à biologia subjacente. No entanto, um modelo de fator bayesiano completo baseado em algoritmos MCMC tem um custo computacional caro, o que o torna pouco atraente para usuários em geral. Neste artigo, apresentamos o pacote slfm que usa implementação C++ via Rcpp para melhorar o desempenho computacional do SLFM dentro da ferramenta estatística amplamente utilizada R. Investigamos dados reais e simulados de microarray relacionados ao câncer de mama.FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisengUniversidade Federal de Minas GeraisUFMGBrasilICX - DEPARTAMENTO DE ESTATÍSTICAJournal of Statistical SoftwareEstatísticaProbabilidadesTeoria bayesiana de decisão estatisticaC++ (Linguagem de programação de computador)R (Linguagem de programação de computador)Factor modelBayesian inferenceGene expressionSparsity priorsRcppSLFMSLFM: an R package to evaluate coherent patterns in microarray data via factor analysisSLFM: um pacote R para avaliar padrões coerentes em dados de microarray via análise fatorialinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://www.jstatsoft.org/article/view/v090i09João Daniel Nunes DuarteVinícius Diniz Mayrinkapplication/pdfinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGLICENSELicense.txtLicense.txttext/plain; charset=utf-82042https://repositorio.ufmg.br/bitstream/1843/56441/1/License.txtfa505098d172de0bc8864fc1287ffe22MD51ORIGINALSLFM an R package to evaluate coherent patterns in microarray data via factor analysis.pdfSLFM an R package to evaluate coherent patterns in microarray data via factor analysis.pdfapplication/pdf960204https://repositorio.ufmg.br/bitstream/1843/56441/2/SLFM%20an%20R%20package%20to%20evaluate%20coherent%20patterns%20in%20microarray%20data%20via%20factor%20analysis.pdf19a8f37cdd96601f2e6c9333ba321f5bMD521843/564412023-07-17 15:52:39.938oai:repositorio.ufmg.br:1843/56441TElDRU7vv71BIERFIERJU1RSSUJVSe+/ve+/vU8gTu+/vU8tRVhDTFVTSVZBIERPIFJFUE9TSVTvv71SSU8gSU5TVElUVUNJT05BTCBEQSBVRk1HCiAKCkNvbSBhIGFwcmVzZW50Ye+/ve+/vW8gZGVzdGEgbGljZW7vv71hLCB2b2Pvv70gKG8gYXV0b3IgKGVzKSBvdSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yKSBjb25jZWRlIGFvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbu+/vW8gZXhjbHVzaXZvIGUgaXJyZXZvZ++/vXZlbCBkZSByZXByb2R1emlyIGUvb3UgZGlzdHJpYnVpciBhIHN1YSBwdWJsaWNh77+977+9byAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0cu+/vW5pY28gZSBlbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mg77+9dWRpbyBvdSB277+9ZGVvLgoKVm9j77+9IGRlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zvv710aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2Pvv70gY29uY29yZGEgcXVlIG8gUmVwb3NpdO+/vXJpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250Ze+/vWRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNh77+977+9byBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHvv73vv71vLgoKVm9j77+9IHRhbWLvv71tIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTvv71yaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPvv71waWEgZGUgc3VhIHB1YmxpY2Hvv73vv71vIHBhcmEgZmlucyBkZSBzZWd1cmFu77+9YSwgYmFjay11cCBlIHByZXNlcnZh77+977+9by4KClZvY++/vSBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNh77+977+9byDvv70gb3JpZ2luYWwgZSBxdWUgdm9j77+9IHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vu77+9YS4gVm9j77+9IHRhbWLvv71tIGRlY2xhcmEgcXVlIG8gZGVw77+9c2l0byBkZSBzdWEgcHVibGljYe+/ve+/vW8gbu+/vW8sIHF1ZSBzZWphIGRlIHNldSBjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd177+9bS4KCkNhc28gYSBzdWEgcHVibGljYe+/ve+/vW8gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY++/vSBu77+9byBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2Pvv70gZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc++/vW8gaXJyZXN0cml0YSBkbyBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGFyYSBjb25jZWRlciBhbyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7vv71hLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3Tvv70gY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250Ze+/vWRvIGRhIHB1YmxpY2Hvv73vv71vIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFBVQkxJQ0Hvv73vv71PIE9SQSBERVBPU0lUQURBIFRFTkhBIFNJRE8gUkVTVUxUQURPIERFIFVNIFBBVFJPQ++/vU5JTyBPVSBBUE9JTyBERSBVTUEgQUfvv71OQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0Pvv70gREVDTEFSQSBRVUUgUkVTUEVJVE9VIFRPRE9TIEUgUVVBSVNRVUVSIERJUkVJVE9TIERFIFJFVklT77+9TyBDT01PIFRBTULvv71NIEFTIERFTUFJUyBPQlJJR0Hvv73vv71FUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l077+9cmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNh77+977+9bywgZSBu77+9byBmYXLvv70gcXVhbHF1ZXIgYWx0ZXJh77+977+9bywgYWzvv71tIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7vv71hLgo=Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2023-07-17T18:52:39Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
dc.title.alternative.pt_BR.fl_str_mv SLFM: um pacote R para avaliar padrões coerentes em dados de microarray via análise fatorial
title SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
spellingShingle SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
João Daniel Nunes Duarte
Factor model
Bayesian inference
Gene expression
Sparsity priors
Rcpp
SLFM
Estatística
Probabilidades
Teoria bayesiana de decisão estatistica
C++ (Linguagem de programação de computador)
R (Linguagem de programação de computador)
title_short SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
title_full SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
title_fullStr SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
title_full_unstemmed SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
title_sort SLFM: an R package to evaluate coherent patterns in microarray data via factor analysis
author João Daniel Nunes Duarte
author_facet João Daniel Nunes Duarte
Vinícius Diniz Mayrink
author_role author
author2 Vinícius Diniz Mayrink
author2_role author
dc.contributor.author.fl_str_mv João Daniel Nunes Duarte
Vinícius Diniz Mayrink
dc.subject.por.fl_str_mv Factor model
Bayesian inference
Gene expression
Sparsity priors
Rcpp
SLFM
topic Factor model
Bayesian inference
Gene expression
Sparsity priors
Rcpp
SLFM
Estatística
Probabilidades
Teoria bayesiana de decisão estatistica
C++ (Linguagem de programação de computador)
R (Linguagem de programação de computador)
dc.subject.other.pt_BR.fl_str_mv Estatística
Probabilidades
Teoria bayesiana de decisão estatistica
C++ (Linguagem de programação de computador)
R (Linguagem de programação de computador)
description The development of simulation-based methods, such as Markov chain Monte Carlo (MCMC), has contributed to an increased interest in the Bayesian framework as an alternative to deal with factor models. Many studies have used Bayesian factor analysis to explore gene expression data. We are particularly interested in the application of a sparse latent factor model (SLFM) based on sparsity priors (mixtures) to assess the significance of factors. The SLFM measures how strong the observed coherent expression pattern is in the data, which is an important source of information to evaluate gene activity. In the literature, this type of model has shown better results than other approaches intended for identification of patterns and metagene groups related to the underlying biology. However, a full Bayesian factor model relying on MCMC algorithms has an expensive computational cost, which makes it unattractive for general users. In this paper, we present the package slfm which uses C++ implementation via Rcpp to improve the computational performance of the SLFM within the widely used statistical tool R. We investigate real and simulated microarray data related to breast cancer.
publishDate 2019
dc.date.issued.fl_str_mv 2019-07-31
dc.date.accessioned.fl_str_mv 2023-07-17T18:52:39Z
dc.date.available.fl_str_mv 2023-07-17T18:52:39Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/56441
dc.identifier.doi.pt_BR.fl_str_mv https://doi.org/10.18637/jss.v090.i09
dc.identifier.issn.pt_BR.fl_str_mv 1548-7660
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0002-5683-8326
url https://doi.org/10.18637/jss.v090.i09
http://hdl.handle.net/1843/56441
https://orcid.org/0000-0002-5683-8326
identifier_str_mv 1548-7660
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartof.pt_BR.fl_str_mv Journal of Statistical Software
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE ESTATÍSTICA
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/56441/1/License.txt
https://repositorio.ufmg.br/bitstream/1843/56441/2/SLFM%20an%20R%20package%20to%20evaluate%20coherent%20patterns%20in%20microarray%20data%20via%20factor%20analysis.pdf
bitstream.checksum.fl_str_mv fa505098d172de0bc8864fc1287ffe22
19a8f37cdd96601f2e6c9333ba321f5b
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1803589567458050048