A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas

Bonidia, Robson P.; MacHida, Jaqueline Sayuri; Negri, Tatianne C.; Alves, Wonder A.L.; Kashiwabara, André Y.; Domingues, Douglas S. [UNESP]; De Carvalho, André; Paschoal, Alexandre R.; Sanches, Danilo S.

A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas

Detalhes bibliográficos
Autor(a) principal:	Bonidia, Robson P.
Data de Publicação:	2020
Outros Autores:	MacHida, Jaqueline Sayuri, Negri, Tatianne C., Alves, Wonder A.L., Kashiwabara, André Y., Domingues, Douglas S. [UNESP], De Carvalho, André, Paschoal, Alexandre R., Sanches, Danilo S.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1109/ACCESS.2020.3028039 http://hdl.handle.net/11449/206069
Resumo:	Machine learning algorithms have been applied to numerous transcript datasets to identify Long non-coding RNAs (lncRNAs). Nevertheless, before these algorithms are applied to RNA data, features must be extracted from the original sequences. As many of these features can be redundant or irrelevant, the predictive performance of the algorithms can be improved by performing feature selection. However, the most current approaches usually select features independently, ignoring possible relations. In this paper, we propose a new model, which identifies the best subsets, removing unnecessary, irrelevant, and redundant predictive features, taking the importance of their co-occurrence into account. The proposed model is based on decomposing solutions and is called k-rounds of decomposition features. In this model, the least relevant features are suppressed according to their contribution to a classification task. To evaluate our proposal, we extract from 5 plant species datasets, a set of features based on sequence structures, using GC content, k-mer (1-6), sequence length, and Open Reading Frame. Next, we apply 5 metaheuristics approaches (Genetic Algorithm, (μ +λ) Evolutionary Algorithm, Artificial Bee Colony, Ant Colony Optimization, and Particle Swarm Optimization) to select the best feature subsets. The main contribution of this work was to include in each metaheuristic a decomposition model that uses round and voting scheme. To investigate its relevance, we select the REPTree classifier to assess the predictive capacity of each subset of features selected in 8 plant species.We identified that the inclusion of the proposed decomposition model significantly reduces the dimensions of the datasets and improves predictive performance, regardless of the metaheuristic. Furthermore, the resulting pipeline has been compared with five approaches in the literature, for lncRNA, when it also showed superior predictive performance. Finally, this study generated a new pipeline to find a minimum number of features in lncRNAs and biological sequences.

Metadados do item

id	UNSP_915ee9533b36eb6eeb991c2359e54ddb
oai_identifier_str	oai:repositorio.unesp.br:11449/206069
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnasBioinformaticsFeature selectionLncRNAsMachine learningMetaheuristicMachine learning algorithms have been applied to numerous transcript datasets to identify Long non-coding RNAs (lncRNAs). Nevertheless, before these algorithms are applied to RNA data, features must be extracted from the original sequences. As many of these features can be redundant or irrelevant, the predictive performance of the algorithms can be improved by performing feature selection. However, the most current approaches usually select features independently, ignoring possible relations. In this paper, we propose a new model, which identifies the best subsets, removing unnecessary, irrelevant, and redundant predictive features, taking the importance of their co-occurrence into account. The proposed model is based on decomposing solutions and is called k-rounds of decomposition features. In this model, the least relevant features are suppressed according to their contribution to a classification task. To evaluate our proposal, we extract from 5 plant species datasets, a set of features based on sequence structures, using GC content, k-mer (1-6), sequence length, and Open Reading Frame. Next, we apply 5 metaheuristics approaches (Genetic Algorithm, (μ +λ) Evolutionary Algorithm, Artificial Bee Colony, Ant Colony Optimization, and Particle Swarm Optimization) to select the best feature subsets. The main contribution of this work was to include in each metaheuristic a decomposition model that uses round and voting scheme. To investigate its relevance, we select the REPTree classifier to assess the predictive capacity of each subset of features selected in 8 plant species.We identified that the inclusion of the proposed decomposition model significantly reduces the dimensions of the datasets and improves predictive performance, regardless of the metaheuristic. Furthermore, the resulting pipeline has been compared with five approaches in the literature, for lncRNA, when it also showed superior predictive performance. Finally, this study generated a new pipeline to find a minimum number of features in lncRNAs and biological sequences.Department of Computer Science Bioinformatics Graduate Program Federal University of Technology-Paraná (UTFPR)Institute of Mathematics and Computer Sciences University of São Paulo (USP)Universidade Nove de Julho (UNINOVE)Department of Botany Institute of Biosciences São Paulo State University (UNESP)Department of Botany Institute of Biosciences São Paulo State University (UNESP)Federal University of Technology-Paraná (UTFPR)Universidade de São Paulo (USP)Universidade Nove de Julho (UNINOVE)Universidade Estadual Paulista (Unesp)Bonidia, Robson P.MacHida, Jaqueline SayuriNegri, Tatianne C.Alves, Wonder A.L.Kashiwabara, André Y.Domingues, Douglas S. [UNESP]De Carvalho, AndréPaschoal, Alexandre R.Sanches, Danilo S.2021-06-25T10:26:04Z2021-06-25T10:26:04Z2020-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article181683-181697http://dx.doi.org/10.1109/ACCESS.2020.3028039IEEE Access, v. 8, p. 181683-181697.2169-3536http://hdl.handle.net/11449/20606910.1109/ACCESS.2020.30280392-s2.0-85102773307Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengIEEE Accessinfo:eu-repo/semantics/openAccess2021-10-22T20:49:01Zoai:repositorio.unesp.br:11449/206069Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462021-10-22T20:49:01Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
title	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
spellingShingle	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas Bonidia, Robson P. Bioinformatics Feature selection LncRNAs Machine learning Metaheuristic
title_short	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
title_full	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
title_fullStr	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
title_full_unstemmed	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
title_sort	A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas
author	Bonidia, Robson P.
author_facet	Bonidia, Robson P. MacHida, Jaqueline Sayuri Negri, Tatianne C. Alves, Wonder A.L. Kashiwabara, André Y. Domingues, Douglas S. [UNESP] De Carvalho, André Paschoal, Alexandre R. Sanches, Danilo S.
author_role	author
author2	MacHida, Jaqueline Sayuri Negri, Tatianne C. Alves, Wonder A.L. Kashiwabara, André Y. Domingues, Douglas S. [UNESP] De Carvalho, André Paschoal, Alexandre R. Sanches, Danilo S.
author2_role	author author author author author author author author
dc.contributor.none.fl_str_mv	Federal University of Technology-Paraná (UTFPR) Universidade de São Paulo (USP) Universidade Nove de Julho (UNINOVE) Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	Bonidia, Robson P. MacHida, Jaqueline Sayuri Negri, Tatianne C. Alves, Wonder A.L. Kashiwabara, André Y. Domingues, Douglas S. [UNESP] De Carvalho, André Paschoal, Alexandre R. Sanches, Danilo S.
dc.subject.por.fl_str_mv	Bioinformatics Feature selection LncRNAs Machine learning Metaheuristic
topic	Bioinformatics Feature selection LncRNAs Machine learning Metaheuristic
description	Machine learning algorithms have been applied to numerous transcript datasets to identify Long non-coding RNAs (lncRNAs). Nevertheless, before these algorithms are applied to RNA data, features must be extracted from the original sequences. As many of these features can be redundant or irrelevant, the predictive performance of the algorithms can be improved by performing feature selection. However, the most current approaches usually select features independently, ignoring possible relations. In this paper, we propose a new model, which identifies the best subsets, removing unnecessary, irrelevant, and redundant predictive features, taking the importance of their co-occurrence into account. The proposed model is based on decomposing solutions and is called k-rounds of decomposition features. In this model, the least relevant features are suppressed according to their contribution to a classification task. To evaluate our proposal, we extract from 5 plant species datasets, a set of features based on sequence structures, using GC content, k-mer (1-6), sequence length, and Open Reading Frame. Next, we apply 5 metaheuristics approaches (Genetic Algorithm, (μ +λ) Evolutionary Algorithm, Artificial Bee Colony, Ant Colony Optimization, and Particle Swarm Optimization) to select the best feature subsets. The main contribution of this work was to include in each metaheuristic a decomposition model that uses round and voting scheme. To investigate its relevance, we select the REPTree classifier to assess the predictive capacity of each subset of features selected in 8 plant species.We identified that the inclusion of the proposed decomposition model significantly reduces the dimensions of the datasets and improves predictive performance, regardless of the metaheuristic. Furthermore, the resulting pipeline has been compared with five approaches in the literature, for lncRNA, when it also showed superior predictive performance. Finally, this study generated a new pipeline to find a minimum number of features in lncRNAs and biological sequences.
publishDate	2020
dc.date.none.fl_str_mv	2020-01-01 2021-06-25T10:26:04Z 2021-06-25T10:26:04Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1109/ACCESS.2020.3028039 IEEE Access, v. 8, p. 181683-181697. 2169-3536 http://hdl.handle.net/11449/206069 10.1109/ACCESS.2020.3028039 2-s2.0-85102773307
url	http://dx.doi.org/10.1109/ACCESS.2020.3028039 http://hdl.handle.net/11449/206069
identifier_str_mv	IEEE Access, v. 8, p. 181683-181697. 2169-3536 10.1109/ACCESS.2020.3028039 2-s2.0-85102773307
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	IEEE Access
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	181683-181697
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1803046090486841344

A novel decomposing model with evolutionary algorithms for feature selection in long non-coding rnas

Registros relacionados