SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence

Detalhes bibliográficos
Autor(a) principal: Lopez-Maestre, Hélène
Data de Publicação: 2016
Outros Autores: Brinza, Lilia, Marchet, Camille, Kielbassa, Janice, Bastien, Sylvère, Boutigny, Mathilde, Monnin, David, Filali, Adil El, Carareto, Claudia Marcia [UNESP], Vieira, Cristina, Picard, Franck, Kremer, Natacha, Vavre, Fabrice, Sagot, Marie-France, Lacroix, Vincent
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1093/nar/gkw655
http://hdl.handle.net/11449/173790
Resumo: SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
id UNSP_6381dce8375f410bddf8bb85f4d8547a
oai_identifier_str oai:repositorio.unesp.br:11449/173790
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequenceSNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.European Research CouncilAgence Nationale de la RechercheUniversité de LyonUniversité Lyon 1 CNRS UMR5558 Laboratoire de Biométrie et Biologie EvolutiveEPI ERABLE - Inria GrenoblePT Génomique et Transcriptomique BIOASTERUniversité de RennesÉquipe GenScale IRISASynergie-Lyon-Cancer Universite Lyon 1 Centre Leon BerardDepartment of Biology UNESP São Paulo State University, São José do Rio PretoDepartment of Biology UNESP São Paulo State University, São José do Rio PretoEuropean Research Council: 247073]10Agence Nationale de la Recherche: ANR-11-BINF-0001-06Agence Nationale de la Recherche: ANR-12-BS02-0008Agence Nationale de la Recherche: ANR-2010-BLAN-170101European Research Council: FP7 /2007-2013Université de LyonLaboratoire de Biométrie et Biologie EvolutiveEPI ERABLE - Inria GrenobleBIOASTERUniversité de RennesIRISACentre Leon BerardUniversidade Estadual Paulista (Unesp)Lopez-Maestre, HélèneBrinza, LiliaMarchet, CamilleKielbassa, JaniceBastien, SylvèreBoutigny, MathildeMonnin, DavidFilali, Adil ElCarareto, Claudia Marcia [UNESP]Vieira, CristinaPicard, FranckKremer, NatachaVavre, FabriceSagot, Marie-FranceLacroix, Vincent2018-12-11T17:07:46Z2018-12-11T17:07:46Z2016-11-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://dx.doi.org/10.1093/nar/gkw655Nucleic Acids Research, v. 44, n. 19, 2016.1362-49620305-1048http://hdl.handle.net/11449/17379010.1093/nar/gkw6552-s2.0-849949083152-s2.0-84994908315.pdf34257729983192160000-0002-0298-1354Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengNucleic Acids Research9,0259,025info:eu-repo/semantics/openAccess2023-11-09T06:09:48Zoai:repositorio.unesp.br:11449/173790Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T17:13:04.549445Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
title SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
spellingShingle SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
Lopez-Maestre, Hélène
title_short SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
title_full SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
title_fullStr SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
title_full_unstemmed SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
title_sort SNP calling from RNA-seq data without a reference genome: Identification, quantification, differential analysis and impact on the protein sequence
author Lopez-Maestre, Hélène
author_facet Lopez-Maestre, Hélène
Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia [UNESP]
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
author_role author
author2 Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia [UNESP]
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Université de Lyon
Laboratoire de Biométrie et Biologie Evolutive
EPI ERABLE - Inria Grenoble
BIOASTER
Université de Rennes
IRISA
Centre Leon Berard
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Lopez-Maestre, Hélène
Brinza, Lilia
Marchet, Camille
Kielbassa, Janice
Bastien, Sylvère
Boutigny, Mathilde
Monnin, David
Filali, Adil El
Carareto, Claudia Marcia [UNESP]
Vieira, Cristina
Picard, Franck
Kremer, Natacha
Vavre, Fabrice
Sagot, Marie-France
Lacroix, Vincent
description SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
publishDate 2016
dc.date.none.fl_str_mv 2016-11-02
2018-12-11T17:07:46Z
2018-12-11T17:07:46Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1093/nar/gkw655
Nucleic Acids Research, v. 44, n. 19, 2016.
1362-4962
0305-1048
http://hdl.handle.net/11449/173790
10.1093/nar/gkw655
2-s2.0-84994908315
2-s2.0-84994908315.pdf
3425772998319216
0000-0002-0298-1354
url http://dx.doi.org/10.1093/nar/gkw655
http://hdl.handle.net/11449/173790
identifier_str_mv Nucleic Acids Research, v. 44, n. 19, 2016.
1362-4962
0305-1048
10.1093/nar/gkw655
2-s2.0-84994908315
2-s2.0-84994908315.pdf
3425772998319216
0000-0002-0298-1354
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Nucleic Acids Research
9,025
9,025
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128774433144832