BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.

Detalhes bibliográficos
Autor(a) principal: ITO, E. A.
Data de Publicação: 2018
Outros Autores: KATAHIRA, I., VICENTE, F. F. da R., PEREIRA, L. F. P., LOPES, F. M.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
Texto Completo: http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754
Resumo: With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
id EMBR_7b1564a7b983c355675b54aa51a79367
oai_identifier_str oai:www.alice.cnptia.embrapa.br:doc/1108754
network_acronym_str EMBR
network_name_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository_id_str 2154
spelling BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.RNA-seqNeurodegenerative diseasesCardiovascular diseasesEpigeneticsNucleotidesWith the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.Eric Augusto Ito, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology ParanáIsaque Katahira, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – ParanáFábio Fernandes da Rocha Vicente, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – ParanáLUIZ FILIPE PROTASIO PEREIRA, CNPCaFabrício Martins Lopes, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná.ITO, E. A.KATAHIRA, I.VICENTE, F. F. da R.PEREIRA, L. F. P.LOPES, F. M.2019-05-07T00:49:48Z2019-05-07T00:49:48Z2019-05-0620182019-05-07T00:49:48Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleNucleic Acids Research, v. 46, n. 16, p. , 2018http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)instacron:EMBRAPA2019-05-07T00:49:55Zoai:www.alice.cnptia.embrapa.br:doc/1108754Repositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestopendoar:21542019-05-07T00:49:55falseRepositório InstitucionalPUBhttps://www.alice.cnptia.embrapa.br/oai/requestcg-riaa@embrapa.bropendoar:21542019-05-07T00:49:55Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)false
dc.title.none.fl_str_mv BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
title BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
spellingShingle BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
ITO, E. A.
RNA-seq
Neurodegenerative diseases
Cardiovascular diseases
Epigenetics
Nucleotides
title_short BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
title_full BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
title_fullStr BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
title_full_unstemmed BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
title_sort BASiNET - Biological Sequences NETwork: a case study on coding and non-coding RNAs identification.
author ITO, E. A.
author_facet ITO, E. A.
KATAHIRA, I.
VICENTE, F. F. da R.
PEREIRA, L. F. P.
LOPES, F. M.
author_role author
author2 KATAHIRA, I.
VICENTE, F. F. da R.
PEREIRA, L. F. P.
LOPES, F. M.
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Eric Augusto Ito, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology Paraná
Isaque Katahira, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná
Fábio Fernandes da Rocha Vicente, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná
LUIZ FILIPE PROTASIO PEREIRA, CNPCa
Fabrício Martins Lopes, Department of Computer Science, Bioinformatics Graduate Program/Federal University of Technology – Paraná.
dc.contributor.author.fl_str_mv ITO, E. A.
KATAHIRA, I.
VICENTE, F. F. da R.
PEREIRA, L. F. P.
LOPES, F. M.
dc.subject.por.fl_str_mv RNA-seq
Neurodegenerative diseases
Cardiovascular diseases
Epigenetics
Nucleotides
topic RNA-seq
Neurodegenerative diseases
Cardiovascular diseases
Epigenetics
Nucleotides
description With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
publishDate 2018
dc.date.none.fl_str_mv 2018
2019-05-07T00:49:48Z
2019-05-07T00:49:48Z
2019-05-06
2019-05-07T00:49:48Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv Nucleic Acids Research, v. 46, n. 16, p. , 2018
http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754
identifier_str_mv Nucleic Acids Research, v. 46, n. 16, p. , 2018
url http://www.alice.cnptia.embrapa.br/alice/handle/doc/1108754
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
instname:Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron:EMBRAPA
instname_str Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
instacron_str EMBRAPA
institution EMBRAPA
reponame_str Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
collection Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice)
repository.name.fl_str_mv Repositório Institucional da EMBRAPA (Repository Open Access to Scientific Information from EMBRAPA - Alice) - Empresa Brasileira de Pesquisa Agropecuária (Embrapa)
repository.mail.fl_str_mv cg-riaa@embrapa.br
_version_ 1794503475012304896