BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
Texto Completo: | http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754 |
Resumo: | With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET. |
id |
EMBR_7b1564a7b983c355675b54aa51a79367 |
---|---|
oai_identifier_str |
oai:www.alice.cnptia.embrapa.br:doc/1108754 |
network_acronym_str |
EMBR |
network_name_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository_id_str |
2154 |
spelling |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.RNA-seqNeurodegenerative diseasesCardiovascular diseasesEpigeneticsNucleotidesWith the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.Eric Augusto Ito, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology ParanáIsaque Katahira, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – ParanáFábio Fernandes da Rocha Vicente, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – ParanáLUIZ FILIPE PROTASIO PEREIRA, CNPCaFabrício Martins Lopes, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná.ITO, E. A.KATAHIRA, I.VICENTE, F. F. da R.PEREIRA, L. F. P.LOPES, F. M.2019-05-07T00:49:48Z2019-05-07T00:49:48Z2019-05-0620182019-05-07T00:49:48Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleNucleic Acids Research, v. 46, n. 16, p. , 2018http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2019-05-07T00:49:55Zoai:www.alice.cnptia.embrapa.br:doc/1108754Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542019-05-07T00:49:55falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542019-05-07T00:49:55Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false |
dc.title.none.fl_str_mv |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
title |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
spellingShingle |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. ITO, E. A. RNA-seq Neurodegenerative diseases Cardiovascular diseases Epigenetics Nucleotides |
title_short |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
title_full |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
title_fullStr |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
title_full_unstemmed |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
title_sort |
BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification. |
author |
ITO, E. A. |
author_facet |
ITO, E. A. KATAHIRA, I. VICENTE, F. F. da R. PEREIRA, L. F. P. LOPES, F. M. |
author_role |
author |
author2 |
KATAHIRA, I. VICENTE, F. F. da R. PEREIRA, L. F. P. LOPES, F. M. |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Eric Augusto Ito, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology Paraná Isaque Katahira, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná Fábio Fernandes da Rocha Vicente, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná LUIZ FILIPE PROTASIO PEREIRA, CNPCa Fabrício Martins Lopes, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná. |
dc.contributor.author.fl_str_mv |
ITO, E. A. KATAHIRA, I. VICENTE, F. F. da R. PEREIRA, L. F. P. LOPES, F. M. |
dc.subject.por.fl_str_mv |
RNA-seq Neurodegenerative diseases Cardiovascular diseases Epigenetics Nucleotides |
topic |
RNA-seq Neurodegenerative diseases Cardiovascular diseases Epigenetics Nucleotides |
description |
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018 2019-05-07T00:49:48Z 2019-05-07T00:49:48Z 2019-05-06 2019-05-07T00:49:48Z |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
Nucleic Acids Research, v. 46, n. 16, p. , 2018 http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754 |
identifier_str_mv |
Nucleic Acids Research, v. 46, n. 16, p. , 2018 |
url |
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa) instacron:EMBRAPA |
instname_str |
Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
instacron_str |
EMBRAPA |
institution |
EMBRAPA |
reponame_str |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
collection |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) |
repository.name.fl_str_mv |
Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa) |
repository.mail.fl_str_mv |
cg-riaa@embrapa.br |
_version_ |
1794503475012304896 |