TERL: classification of transposable elements by convolutional neural networks
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1093/bib/bbaa185 http://hdl.handle.net/11449/221755 |
Resumo: | Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br. |
id |
UNSP_7c203331836462deca799773117ae366 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/221755 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
TERL: classification of transposable elements by convolutional neural networksconvolutional neural networksdeep learningrepresentation learningsequence classificationtransposable elementsTransposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.Federal University of Technology - Parana (UTFPR)Bioinformatics Graduation Program (PPGBIOINFO) Department of Computer Science Federal University of Technology - Parana (UTFPR)São Paulo State University at BotucatuUniversity of São PauloDepartment of Biodiversity São Paulo State University at Rio ClaroEuripides Soares da Rocha University of MariliaUniversity of São Paulo (ICMC-USP)University of Campinas (IC-UNICAMP)Department of Computing Federal University of Technology - Parana (UTFPR)São Paulo State University at BotucatuDepartment of Biodiversity São Paulo State University at Rio ClaroFederal University of Technology - Parana (UTFPR)Universidade Estadual Paulista (UNESP)Universidade de São Paulo (USP)Euripides Soares da Rocha University of MariliaUniversidade Estadual de Campinas (UNICAMP)da Cruz, Murilo Horacio PereiraDomingues, Douglas Silva [UNESP]Saito, Priscila Tiemi MaedaPaschoal, Alexandre RossiBugatti, Pedro Henrique2022-04-28T19:40:16Z2022-04-28T19:40:16Z2021-05-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1093/bib/bbaa185Briefings in bioinformatics, v. 22, n. 3, 2021.1477-4054http://hdl.handle.net/11449/22175510.1093/bib/bbaa1852-s2.0-85106486317Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengBriefings in bioinformaticsinfo:eu-repo/semantics/openAccess2022-04-28T19:40:16Zoai:repositorio.unesp.br:11449/221755Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T20:36:40.338352Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
TERL: classification of transposable elements by convolutional neural networks |
title |
TERL: classification of transposable elements by convolutional neural networks |
spellingShingle |
TERL: classification of transposable elements by convolutional neural networks da Cruz, Murilo Horacio Pereira convolutional neural networks deep learning representation learning sequence classification transposable elements |
title_short |
TERL: classification of transposable elements by convolutional neural networks |
title_full |
TERL: classification of transposable elements by convolutional neural networks |
title_fullStr |
TERL: classification of transposable elements by convolutional neural networks |
title_full_unstemmed |
TERL: classification of transposable elements by convolutional neural networks |
title_sort |
TERL: classification of transposable elements by convolutional neural networks |
author |
da Cruz, Murilo Horacio Pereira |
author_facet |
da Cruz, Murilo Horacio Pereira Domingues, Douglas Silva [UNESP] Saito, Priscila Tiemi Maeda Paschoal, Alexandre Rossi Bugatti, Pedro Henrique |
author_role |
author |
author2 |
Domingues, Douglas Silva [UNESP] Saito, Priscila Tiemi Maeda Paschoal, Alexandre Rossi Bugatti, Pedro Henrique |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Federal University of Technology - Parana (UTFPR) Universidade Estadual Paulista (UNESP) Universidade de São Paulo (USP) Euripides Soares da Rocha University of Marilia Universidade Estadual de Campinas (UNICAMP) |
dc.contributor.author.fl_str_mv |
da Cruz, Murilo Horacio Pereira Domingues, Douglas Silva [UNESP] Saito, Priscila Tiemi Maeda Paschoal, Alexandre Rossi Bugatti, Pedro Henrique |
dc.subject.por.fl_str_mv |
convolutional neural networks deep learning representation learning sequence classification transposable elements |
topic |
convolutional neural networks deep learning representation learning sequence classification transposable elements |
description |
Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-05-20 2022-04-28T19:40:16Z 2022-04-28T19:40:16Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1093/bib/bbaa185 Briefings in bioinformatics, v. 22, n. 3, 2021. 1477-4054 http://hdl.handle.net/11449/221755 10.1093/bib/bbaa185 2-s2.0-85106486317 |
url |
http://dx.doi.org/10.1093/bib/bbaa185 http://hdl.handle.net/11449/221755 |
identifier_str_mv |
Briefings in bioinformatics, v. 22, n. 3, 2021. 1477-4054 10.1093/bib/bbaa185 2-s2.0-85106486317 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Briefings in bioinformatics |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129226111451136 |