Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon

Detalhes bibliográficos
Autor(a) principal: Radulovici, Adriana E.
Data de Publicação: 2021
Outros Autores: Vieira, Pedro Emanuel Ferreira Reis, Duarte, Sofia Alexandra Ferreira, Teixeira, Marcos André Machado Lima, Borges, Luisa M.S., Deagle, Bruce E., Majaneva, Sanna, Redmond, Niamh, Schultz, Jessica A., Costa, Filipe O.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/80004
Resumo: The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: A) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.
id RCAP_da03cdfba34e9594d948eeef8d9a7903
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/80004
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathonAnnotationData curationDNA barcodingMarine invertebratesMetabarcodingReference librariesThe accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: A) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.The hackathon was organized with financial support from the European Union COST Action DNAqua-Net (CA 15219 https://dnaqua.net/) in the scope of the 8th International Barcode of Life Conference in Trondheim, Norway on 16 June 2019. DNAqua-Net is acknowledged for the funding provided and the local conference organizers for all the logistical support that ensured a successful event. Tyler Elliot and the rest of the BOLD team are acknowledged for their help with data queries and analytics. The authors also thank the hackathon participants for vibrant discussions during and after the event: Berry van der Hoorn, Katrine Konsghavn, Guy Paz, Mouna Rifi, Malin Strand, Anne Helene Tandberg, Adam Wall, and Endre Willassen. Marcos A. L. Teixeira was supported by a PhD grant from the Portuguese Foundation for Science and Technology (FCT I.P.) co-financed by ESF (SFRH/BD/131527/2017). Financial support granted by FCT to Sofia Duarte (CEECIND/00667/2017) and to Pedro E. Vieira (project NIS-DNA, PTDC/BIA-BMA/29754/2017) is also acknowledged. Sanna Majaneva was financially supported by the Norwegian Taxonomy Initiative (project no. 70184235). The authors thank the five reviewers who provided valuable input into the earlier version of the manuscript.Pensoft PublishersUniversidade do MinhoRadulovici, Adriana E.Vieira, Pedro Emanuel Ferreira ReisDuarte, Sofia Alexandra FerreiraTeixeira, Marcos André Machado LimaBorges, Luisa M.S.Deagle, Bruce E.Majaneva, SannaRedmond, NiamhSchultz, Jessica A.Costa, Filipe O.20212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/80004engRadulovici AE, Vieira PE, Duarte S, Teixeira MAL, Borges LMS, Deagle BE, Majaneva S, Redmond N, Schultz JA, Costa FO (2021) Revision and annotation of DNA barcode records for marine invertebrates: report of the 8 th iBOL conference hackathon. Metabarcoding and Metagenomics 5: e67862. https://doi.org/10.3897/mbmg.5.678622534-970810.3897/mbmg.5.67862https://mbmg.pensoft.net/article/67862/info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:41:51Zoai:repositorium.sdum.uminho.pt:1822/80004Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:38:57.134766Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
title Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
spellingShingle Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
Radulovici, Adriana E.
Annotation
Data curation
DNA barcoding
Marine invertebrates
Metabarcoding
Reference libraries
title_short Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
title_full Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
title_fullStr Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
title_full_unstemmed Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
title_sort Revision and annotation of DNA barcode records for marine invertebrates: Report of the 8th iBOL conference hackathon
author Radulovici, Adriana E.
author_facet Radulovici, Adriana E.
Vieira, Pedro Emanuel Ferreira Reis
Duarte, Sofia Alexandra Ferreira
Teixeira, Marcos André Machado Lima
Borges, Luisa M.S.
Deagle, Bruce E.
Majaneva, Sanna
Redmond, Niamh
Schultz, Jessica A.
Costa, Filipe O.
author_role author
author2 Vieira, Pedro Emanuel Ferreira Reis
Duarte, Sofia Alexandra Ferreira
Teixeira, Marcos André Machado Lima
Borges, Luisa M.S.
Deagle, Bruce E.
Majaneva, Sanna
Redmond, Niamh
Schultz, Jessica A.
Costa, Filipe O.
author2_role author
author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Radulovici, Adriana E.
Vieira, Pedro Emanuel Ferreira Reis
Duarte, Sofia Alexandra Ferreira
Teixeira, Marcos André Machado Lima
Borges, Luisa M.S.
Deagle, Bruce E.
Majaneva, Sanna
Redmond, Niamh
Schultz, Jessica A.
Costa, Filipe O.
dc.subject.por.fl_str_mv Annotation
Data curation
DNA barcoding
Marine invertebrates
Metabarcoding
Reference libraries
topic Annotation
Data curation
DNA barcoding
Marine invertebrates
Metabarcoding
Reference libraries
description The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: A) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.
publishDate 2021
dc.date.none.fl_str_mv 2021
2021-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/80004
url https://hdl.handle.net/1822/80004
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Radulovici AE, Vieira PE, Duarte S, Teixeira MAL, Borges LMS, Deagle BE, Majaneva S, Redmond N, Schultz JA, Costa FO (2021) Revision and annotation of DNA barcode records for marine invertebrates: report of the 8 th iBOL conference hackathon. Metabarcoding and Metagenomics 5: e67862. https://doi.org/10.3897/mbmg.5.67862
2534-9708
10.3897/mbmg.5.67862
https://mbmg.pensoft.net/article/67862/
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Pensoft Publishers
publisher.none.fl_str_mv Pensoft Publishers
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132928764018688