Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets

Detalhes bibliográficos
Autor(a) principal: De‐Kayne, Rishi
Data de Publicação: 2020
Outros Autores: Frei, David, Greenway, Ryan, Mendes, Sofia L., Retel, Cas, Feulner, Philine G. D.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/45862
Resumo: Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.
id RCAP_9345b431c750f50687f6de4cfc89a9f5
oai_identifier_str oai:repositorio.ul.pt:10451/45862
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Sequencing platform shifts provide opportunities but pose challenges for combining genomic data setsTechnological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.WileyRepositório da Universidade de LisboaDe‐Kayne, RishiFrei, DavidGreenway, RyanMendes, Sofia L.Retel, CasFeulner, Philine G. D.2021-12-01T01:30:34Z2020-122020-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/45862engDe-Kayne R, Frei D, Greenway R, Mendes SL, Retel C, Feulner PG. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Mol Ecol Resour. 2020;00:1–8. ht t p s://doi.10.1111/1755-0998.13309info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:47:53Zoai:repositorio.ul.pt:10451/45862Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:58:12.505906Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
title Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
spellingShingle Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
De‐Kayne, Rishi
title_short Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
title_full Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
title_fullStr Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
title_full_unstemmed Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
title_sort Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
author De‐Kayne, Rishi
author_facet De‐Kayne, Rishi
Frei, David
Greenway, Ryan
Mendes, Sofia L.
Retel, Cas
Feulner, Philine G. D.
author_role author
author2 Frei, David
Greenway, Ryan
Mendes, Sofia L.
Retel, Cas
Feulner, Philine G. D.
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv De‐Kayne, Rishi
Frei, David
Greenway, Ryan
Mendes, Sofia L.
Retel, Cas
Feulner, Philine G. D.
description Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.
publishDate 2020
dc.date.none.fl_str_mv 2020-12
2020-12-01T00:00:00Z
2021-12-01T01:30:34Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/45862
url http://hdl.handle.net/10451/45862
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv De-Kayne R, Frei D, Greenway R, Mendes SL, Retel C, Feulner PG. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Mol Ecol Resour. 2020;00:1–8. ht t p s://doi.
10.1111/1755-0998.13309
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Wiley
publisher.none.fl_str_mv Wiley
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134527626412032