Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1093/bib/bby034 http://hdl.handle.net/11449/187774 |
Resumo: | MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species. |
id |
UNSP_ca678d1c3921148426a8e1c068b76b3e |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/187774 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plantsbioinformaticsfeatureslong RNAsmachine learningpattern recognitiontoolMOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.Department of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Informatics and Knowledge Management Graduate Program Universidade Nove de Julho, Campus CornélioInformatics and Knowledge Management Graduate Program Universidade Nove de JulhoDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR, Campus CornélioDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Department of Botany Institute of Biosciences São Paulo State University UNESP, Campus CornélioDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Department of Botany Institute of Biosciences São Paulo State University UNESP, Campus CornélioUniversidade Nove de JulhoUTFPRUniversidade Estadual Paulista (Unesp)Negri, Tatianne da CostaAlves, Wonder Alexandre LuzBugatti, Pedro HenriqueSaito, Priscila Tiemi MaedaDomingues, Douglas Silva [UNESP]Paschoal, Alexandre Rossi2019-10-06T15:46:52Z2019-10-06T15:46:52Z2019-03-25info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article682-689http://dx.doi.org/10.1093/bib/bby034Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019.1477-4054http://hdl.handle.net/11449/18777410.1093/bib/bby0342-s2.0-85067536297Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengBriefings in bioinformaticsinfo:eu-repo/semantics/openAccess2021-10-22T18:33:34Zoai:repositorio.unesp.br:11449/187774Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T23:40:14.876857Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
title |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
spellingShingle |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants Negri, Tatianne da Costa bioinformatics features long RNAs machine learning pattern recognition tool |
title_short |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
title_full |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
title_fullStr |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
title_full_unstemmed |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
title_sort |
Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants |
author |
Negri, Tatianne da Costa |
author_facet |
Negri, Tatianne da Costa Alves, Wonder Alexandre Luz Bugatti, Pedro Henrique Saito, Priscila Tiemi Maeda Domingues, Douglas Silva [UNESP] Paschoal, Alexandre Rossi |
author_role |
author |
author2 |
Alves, Wonder Alexandre Luz Bugatti, Pedro Henrique Saito, Priscila Tiemi Maeda Domingues, Douglas Silva [UNESP] Paschoal, Alexandre Rossi |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Nove de Julho UTFPR Universidade Estadual Paulista (Unesp) |
dc.contributor.author.fl_str_mv |
Negri, Tatianne da Costa Alves, Wonder Alexandre Luz Bugatti, Pedro Henrique Saito, Priscila Tiemi Maeda Domingues, Douglas Silva [UNESP] Paschoal, Alexandre Rossi |
dc.subject.por.fl_str_mv |
bioinformatics features long RNAs machine learning pattern recognition tool |
topic |
bioinformatics features long RNAs machine learning pattern recognition tool |
description |
MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-10-06T15:46:52Z 2019-10-06T15:46:52Z 2019-03-25 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1093/bib/bby034 Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019. 1477-4054 http://hdl.handle.net/11449/187774 10.1093/bib/bby034 2-s2.0-85067536297 |
url |
http://dx.doi.org/10.1093/bib/bby034 http://hdl.handle.net/11449/187774 |
identifier_str_mv |
Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019. 1477-4054 10.1093/bib/bby034 2-s2.0-85067536297 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Briefings in bioinformatics |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
682-689 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129541666766848 |