Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/45862 |
Resumo: | Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology. |
id |
RCAP_9345b431c750f50687f6de4cfc89a9f5 |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/45862 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data setsTechnological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.WileyRepositório da Universidade de LisboaDe‐Kayne, RishiFrei, DavidGreenway, RyanMendes, Sofia L.Retel, CasFeulner, Philine G. D.2021-12-01T01:30:34Z2020-122020-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/45862engDe-Kayne R, Frei D, Greenway R, Mendes SL, Retel C, Feulner PG. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Mol Ecol Resour. 2020;00:1–8. ht t p s://doi.10.1111/1755-0998.13309info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T16:47:53Zoai:repositorio.ul.pt:10451/45862Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:58:12.505906Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
title |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
spellingShingle |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets De‐Kayne, Rishi |
title_short |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
title_full |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
title_fullStr |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
title_full_unstemmed |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
title_sort |
Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets |
author |
De‐Kayne, Rishi |
author_facet |
De‐Kayne, Rishi Frei, David Greenway, Ryan Mendes, Sofia L. Retel, Cas Feulner, Philine G. D. |
author_role |
author |
author2 |
Frei, David Greenway, Ryan Mendes, Sofia L. Retel, Cas Feulner, Philine G. D. |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
De‐Kayne, Rishi Frei, David Greenway, Ryan Mendes, Sofia L. Retel, Cas Feulner, Philine G. D. |
description |
Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-12 2020-12-01T00:00:00Z 2021-12-01T01:30:34Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/45862 |
url |
http://hdl.handle.net/10451/45862 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
De-Kayne R, Frei D, Greenway R, Mendes SL, Retel C, Feulner PG. Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Mol Ecol Resour. 2020;00:1–8. ht t p s://doi. 10.1111/1755-0998.13309 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Wiley |
publisher.none.fl_str_mv |
Wiley |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134527626412032 |