Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood

Detalhes bibliográficos
Autor(a) principal: He, Tuo
Data de Publicação: 2019
Outros Autores: Jiao, Lichao, Wiedenhoeft, Alex C., Yin, Yafang
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1007/s00425-019-03116-3
http://hdl.handle.net/11449/184449
Resumo: Main conclusion Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood. DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rp132, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8-100%) than distance- (15.1-97.4%) and tree-based methods (11.1-87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rp132. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.
id UNSP_13dbd55c6bf27f89574bb9876815ca8a
oai_identifier_str oai:repositorio.unesp.br:11449/184449
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus woodDNA barcodingForensic wood identificationIdentification accuracyMachine learning approaches (MLAs)PterocarpusSMO classifierMain conclusion Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood. DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rp132, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8-100%) than distance- (15.1-97.4%) and tree-based methods (11.1-87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rp132. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.National Natural Science Foundation of ChinaNational High-level Talent for Special Support Program of ChinaChina Scholarship CouncilChinese Acad Forestry, Chinese Res Inst Wood Ind, Dept Wood Anat & Utilizat, Beijing 100091, Peoples R ChinaChinese Acad Forestry, Wood Collect WOODPEDIA, Beijing 100091, Peoples R ChinaUS Forest Serv, Forest Prod Lab, Ctr Wood Anat Res, USDA, Madison, WI 53726 USAUniv Wisconsin, Dept Bot, Madison, WI 53706 USAPurdue Univ, Dept Forestry & Natl Resources, W Lafayette, IN 47907 USAUniv Estadual Paulista, Ciencias Biol Bot, Botucatu, SP, BrazilUniv Estadual Paulista, Ciencias Biol Bot, Botucatu, SP, BrazilNational Natural Science Foundation of China: 31600451National High-level Talent for Special Support Program of China: W02020331China Scholarship Council: 2017-3109SpringerChinese Acad ForestryUS Forest ServUniv WisconsinPurdue UnivUniversidade Estadual Paulista (Unesp)He, TuoJiao, LichaoWiedenhoeft, Alex C.Yin, Yafang2019-10-04T12:13:41Z2019-10-04T12:13:41Z2019-05-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article1617-1625http://dx.doi.org/10.1007/s00425-019-03116-3Planta. New York: Springer, v. 249, n. 5, p. 1617-1625, 2019.0032-0935http://hdl.handle.net/11449/18444910.1007/s00425-019-03116-3WOS:000464898700025Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengPlantainfo:eu-repo/semantics/openAccess2021-10-23T16:09:06Zoai:repositorio.unesp.br:11449/184449Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T13:54:46.197809Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
title Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
spellingShingle Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
He, Tuo
DNA barcoding
Forensic wood identification
Identification accuracy
Machine learning approaches (MLAs)
Pterocarpus
SMO classifier
title_short Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
title_full Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
title_fullStr Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
title_full_unstemmed Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
title_sort Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
author He, Tuo
author_facet He, Tuo
Jiao, Lichao
Wiedenhoeft, Alex C.
Yin, Yafang
author_role author
author2 Jiao, Lichao
Wiedenhoeft, Alex C.
Yin, Yafang
author2_role author
author
author
dc.contributor.none.fl_str_mv Chinese Acad Forestry
US Forest Serv
Univ Wisconsin
Purdue Univ
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv He, Tuo
Jiao, Lichao
Wiedenhoeft, Alex C.
Yin, Yafang
dc.subject.por.fl_str_mv DNA barcoding
Forensic wood identification
Identification accuracy
Machine learning approaches (MLAs)
Pterocarpus
SMO classifier
topic DNA barcoding
Forensic wood identification
Identification accuracy
Machine learning approaches (MLAs)
Pterocarpus
SMO classifier
description Main conclusion Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood. DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rp132, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8-100%) than distance- (15.1-97.4%) and tree-based methods (11.1-87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rp132. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.
publishDate 2019
dc.date.none.fl_str_mv 2019-10-04T12:13:41Z
2019-10-04T12:13:41Z
2019-05-01
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1007/s00425-019-03116-3
Planta. New York: Springer, v. 249, n. 5, p. 1617-1625, 2019.
0032-0935
http://hdl.handle.net/11449/184449
10.1007/s00425-019-03116-3
WOS:000464898700025
url http://dx.doi.org/10.1007/s00425-019-03116-3
http://hdl.handle.net/11449/184449
identifier_str_mv Planta. New York: Springer, v. 249, n. 5, p. 1617-1625, 2019.
0032-0935
10.1007/s00425-019-03116-3
WOS:000464898700025
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Planta
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 1617-1625
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1808128289262272512