MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | LOCUS Repositório Institucional da UFV |
Texto Completo: | https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x http://www.locus.ufv.br/handle/123456789/12157 |
Resumo: | This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics. |
id |
UFV_7cc933adc32f9690b285b59e1c410cbd |
---|---|
oai_identifier_str |
oai:locus.ufv.br:123456789/12157 |
network_acronym_str |
UFV |
network_name_str |
LOCUS Repositório Institucional da UFV |
repository_id_str |
2145 |
spelling |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithmArtificial neural networkCost sensitive classificationPeptide/protein identificationPhosphoproteomicsShotgun proteomicsData miningThis work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.BMC Bioinformatics2017-10-18T16:37:07Z2017-10-18T16:37:07Z2016-12-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlepdfapplication/pdf1471-2105https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-xhttp://www.locus.ufv.br/handle/123456789/12157eng17(Suppl 18):472, p.25-38, December 2016Cerqueira, Fabio RibeiroRicardo, Adilson MendesOliveira, Alcione de PaivaGraber, ArminBaumgartner, Christianinfo:eu-repo/semantics/openAccessreponame:LOCUS Repositório Institucional da UFVinstname:Universidade Federal de Viçosa (UFV)instacron:UFV2024-07-12T06:16:45Zoai:locus.ufv.br:123456789/12157Repositório InstitucionalPUBhttps://www.locus.ufv.br/oai/requestfabiojreis@ufv.bropendoar:21452024-07-12T06:16:45LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)false |
dc.title.none.fl_str_mv |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
title |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
spellingShingle |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm Cerqueira, Fabio Ribeiro Artificial neural network Cost sensitive classification Peptide/protein identification Phosphoproteomics Shotgun proteomics Data mining |
title_short |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
title_full |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
title_fullStr |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
title_full_unstemmed |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
title_sort |
MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm |
author |
Cerqueira, Fabio Ribeiro |
author_facet |
Cerqueira, Fabio Ribeiro Ricardo, Adilson Mendes Oliveira, Alcione de Paiva Graber, Armin Baumgartner, Christian |
author_role |
author |
author2 |
Ricardo, Adilson Mendes Oliveira, Alcione de Paiva Graber, Armin Baumgartner, Christian |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Cerqueira, Fabio Ribeiro Ricardo, Adilson Mendes Oliveira, Alcione de Paiva Graber, Armin Baumgartner, Christian |
dc.subject.por.fl_str_mv |
Artificial neural network Cost sensitive classification Peptide/protein identification Phosphoproteomics Shotgun proteomics Data mining |
topic |
Artificial neural network Cost sensitive classification Peptide/protein identification Phosphoproteomics Shotgun proteomics Data mining |
description |
This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-12-15 2017-10-18T16:37:07Z 2017-10-18T16:37:07Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
1471-2105 https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x http://www.locus.ufv.br/handle/123456789/12157 |
identifier_str_mv |
1471-2105 |
url |
https://doi-org.ez35.periodicos.capes.gov.br/10.1186/s12859-016-1341-x http://www.locus.ufv.br/handle/123456789/12157 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
17(Suppl 18):472, p.25-38, December 2016 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
pdf application/pdf |
dc.publisher.none.fl_str_mv |
BMC Bioinformatics |
publisher.none.fl_str_mv |
BMC Bioinformatics |
dc.source.none.fl_str_mv |
reponame:LOCUS Repositório Institucional da UFV instname:Universidade Federal de Viçosa (UFV) instacron:UFV |
instname_str |
Universidade Federal de Viçosa (UFV) |
instacron_str |
UFV |
institution |
UFV |
reponame_str |
LOCUS Repositório Institucional da UFV |
collection |
LOCUS Repositório Institucional da UFV |
repository.name.fl_str_mv |
LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV) |
repository.mail.fl_str_mv |
fabiojreis@ufv.br |
_version_ |
1817559824209018880 |