Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants

Detalhes bibliográficos
Autor(a) principal: Negri, Tatianne da Costa
Data de Publicação: 2019
Outros Autores: Alves, Wonder Alexandre Luz, Bugatti, Pedro Henrique, Saito, Priscila Tiemi Maeda, Domingues, Douglas Silva [UNESP], Paschoal, Alexandre Rossi
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1093/bib/bby034
http://hdl.handle.net/11449/187774
Resumo: MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.
id UNSP_ca678d1c3921148426a8e1c068b76b3e
oai_identifier_str oai:repositorio.unesp.br:11449/187774
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plantsbioinformaticsfeatureslong RNAsmachine learningpattern recognitiontoolMOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.Department of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Informatics and Knowledge Management Graduate Program Universidade Nove de Julho, Campus CornélioInformatics and Knowledge Management Graduate Program Universidade Nove de JulhoDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR, Campus CornélioDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Department of Botany Institute of Biosciences São Paulo State University UNESP, Campus CornélioDepartment of Computer Science Bioinformatics Graduate Program (PPGBIOINFO) Federal University of Technology - Paraná UTFPR Brazil and Department of Botany Institute of Biosciences São Paulo State University UNESP, Campus CornélioUniversidade Nove de JulhoUTFPRUniversidade Estadual Paulista (Unesp)Negri, Tatianne da CostaAlves, Wonder Alexandre LuzBugatti, Pedro HenriqueSaito, Priscila Tiemi MaedaDomingues, Douglas Silva [UNESP]Paschoal, Alexandre Rossi2019-10-06T15:46:52Z2019-10-06T15:46:52Z2019-03-25info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article682-689http://dx.doi.org/10.1093/bib/bby034Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019.1477-4054http://hdl.handle.net/11449/18777410.1093/bib/bby0342-s2.0-85067536297Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengBriefings in bioinformaticsinfo:eu-repo/semantics/openAccess2021-10-22T18:33:34Zoai:repositorio.unesp.br:11449/187774Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462021-10-22T18:33:34Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
title Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
spellingShingle Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
Negri, Tatianne da Costa
bioinformatics
features
long RNAs
machine learning
pattern recognition
tool
title_short Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
title_full Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
title_fullStr Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
title_full_unstemmed Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
title_sort Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants
author Negri, Tatianne da Costa
author_facet Negri, Tatianne da Costa
Alves, Wonder Alexandre Luz
Bugatti, Pedro Henrique
Saito, Priscila Tiemi Maeda
Domingues, Douglas Silva [UNESP]
Paschoal, Alexandre Rossi
author_role author
author2 Alves, Wonder Alexandre Luz
Bugatti, Pedro Henrique
Saito, Priscila Tiemi Maeda
Domingues, Douglas Silva [UNESP]
Paschoal, Alexandre Rossi
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade Nove de Julho
UTFPR
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Negri, Tatianne da Costa
Alves, Wonder Alexandre Luz
Bugatti, Pedro Henrique
Saito, Priscila Tiemi Maeda
Domingues, Douglas Silva [UNESP]
Paschoal, Alexandre Rossi
dc.subject.por.fl_str_mv bioinformatics
features
long RNAs
machine learning
pattern recognition
tool
topic bioinformatics
features
long RNAs
machine learning
pattern recognition
tool
description MOTIVATION: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github.com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. RESULTS: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.
publishDate 2019
dc.date.none.fl_str_mv 2019-10-06T15:46:52Z
2019-10-06T15:46:52Z
2019-03-25
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1093/bib/bby034
Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019.
1477-4054
http://hdl.handle.net/11449/187774
10.1093/bib/bby034
2-s2.0-85067536297
url http://dx.doi.org/10.1093/bib/bby034
http://hdl.handle.net/11449/187774
identifier_str_mv Briefings in bioinformatics, v. 20, n. 2, p. 682-689, 2019.
1477-4054
10.1093/bib/bby034
2-s2.0-85067536297
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Briefings in bioinformatics
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 682-689
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1803047396829036544