MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm

Detalhes bibliográficos
Autor(a) principal: Cerqueira, Fabio Ribeiro
Data de Publicação: 2016
Outros Autores: Ricardo, Adilson Mendes, Oliveira, Alcione de Paiva, Graber, Armin, Baumgartner, Christian
Tipo de documento: Artigo
Idioma: eng
Título da fonte: LOCUS Repositório Institucional da UFV
Texto Completo: https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x
http://www.locus.ufv.br/handle/123456789/12157
Resumo: This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
id UFV_7cc933adc32f9690b285b59e1c410cbd
oai_identifier_str oai:locus.ufv.br:123456789/12157
network_acronym_str UFV
network_name_str LOCUS Repositório Institucional da UFV
repository_id_str 2145
spelling Cerqueira, Fabio RibeiroRicardo, Adilson MendesOliveira, Alcione de PaivaGraber, ArminBaumgartner, Christian2017-10-18T16:37:07Z2017-10-18T16:37:07Z2016-12-151471-2105https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-xhttp://www.locus.ufv.br/handle/123456789/12157This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.engBMC Bioinformatics17(Suppl 18):472, p.25-38, December 2016Artificial neural networkCost sensitive classificationPeptide/protein identificationPhosphoproteomicsShotgun proteomicsData miningMUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithminfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfinfo:eu-repo/semantics/openAccessreponame:LOCUS Repositório Institucional da UFVinstname:Universidade Federal de Viçosa (UFV)instacron:UFVORIGINALdocument.pdfdocument.pdftexto completoapplication/pdf1981459https://locus.ufv.br//bitstream/123456789/12157/1/document.pdf307e649393d595416bd3d4d352c852f7MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://locus.ufv.br//bitstream/123456789/12157/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52THUMBNAILdocument.pdf.jpgdocument.pdf.jpgIM Thumbnailimage/jpeg5081https://locus.ufv.br//bitstream/123456789/12157/3/document.pdf.jpg55a463934192645121ade29da16f8b6fMD53123456789/121572017-10-18 22:00:33.732oai:locus.ufv.br:123456789/12157Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttps://www.locus.ufv.br/oai/requestfabiojreis@ufv.bropendoar:21452017-10-19T01:00:33LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)false
dc.title.en.fl_str_mv MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
title MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
spellingShingle MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
Cerqueira, Fabio Ribeiro
Artificial neural network
Cost sensitive classification
Peptide/protein identification
Phosphoproteomics
Shotgun proteomics
Data mining
title_short MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
title_full MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
title_fullStr MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
title_full_unstemmed MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
title_sort MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
author Cerqueira, Fabio Ribeiro
author_facet Cerqueira, Fabio Ribeiro
Ricardo, Adilson Mendes
Oliveira, Alcione de Paiva
Graber, Armin
Baumgartner, Christian
author_role author
author2 Ricardo, Adilson Mendes
Oliveira, Alcione de Paiva
Graber, Armin
Baumgartner, Christian
author2_role author
author
author
author
dc.contributor.author.fl_str_mv Cerqueira, Fabio Ribeiro
Ricardo, Adilson Mendes
Oliveira, Alcione de Paiva
Graber, Armin
Baumgartner, Christian
dc.subject.pt-BR.fl_str_mv Artificial neural network
Cost sensitive classification
Peptide/protein identification
Phosphoproteomics
Shotgun proteomics
Data mining
topic Artificial neural network
Cost sensitive classification
Peptide/protein identification
Phosphoproteomics
Shotgun proteomics
Data mining
description This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
publishDate 2016
dc.date.issued.fl_str_mv 2016-12-15
dc.date.accessioned.fl_str_mv 2017-10-18T16:37:07Z
dc.date.available.fl_str_mv 2017-10-18T16:37:07Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x
http://www.locus.ufv.br/handle/123456789/12157
dc.identifier.issn.none.fl_str_mv 1471-2105
identifier_str_mv 1471-2105
url https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x
http://www.locus.ufv.br/handle/123456789/12157
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartofseries.pt-BR.fl_str_mv 17(Suppl 18):472, p.25-38, December 2016
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv BMC Bioinformatics
publisher.none.fl_str_mv BMC Bioinformatics
dc.source.none.fl_str_mv reponame:LOCUS Repositório Institucional da UFV
instname:Universidade Federal de Viçosa (UFV)
instacron:UFV
instname_str Universidade Federal de Viçosa (UFV)
instacron_str UFV
institution UFV
reponame_str LOCUS Repositório Institucional da UFV
collection LOCUS Repositório Institucional da UFV
bitstream.url.fl_str_mv https://locus.ufv.br//bitstream/123456789/12157/1/document.pdf
https://locus.ufv.br//bitstream/123456789/12157/2/license.txt
https://locus.ufv.br//bitstream/123456789/12157/3/document.pdf.jpg
bitstream.checksum.fl_str_mv 307e649393d595416bd3d4d352c852f7
8a4605be74aa9ea9d79846c1fba20a33
55a463934192645121ade29da16f8b6f
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)
repository.mail.fl_str_mv fabiojreis@ufv.br
_version_ 1801212861380296704