A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Brazilian Archives of Biology and Technology |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429 |
Resumo: | Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome. |
id |
TECPAR-1_bb7dfa1e9135046175610cbed7d773a9 |
---|---|
oai_identifier_str |
oai:scielo:S1516-89132021000100429 |
network_acronym_str |
TECPAR-1 |
network_name_str |
Brazilian Archives of Biology and Technology |
repository_id_str |
|
spelling |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat PatternGenome SequencesFeature ExtractionClassificationCorona virusCOVID-19Machine LearningAbstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome.Instituto de Tecnologia do Paraná - Tecpar2021-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429Brazilian Archives of Biology and Technology v.64 2021reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2021210075info:eu-repo/semantics/openAccessMurugaiah,MuthulakshmiGanesan,Murugeswarieng2022-01-07T00:00:00Zoai:scielo:S1516-89132021000100429Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br||babt@tecpar.br1678-43241516-8913opendoar:2022-01-07T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false |
dc.title.none.fl_str_mv |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
title |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
spellingShingle |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern Murugaiah,Muthulakshmi Genome Sequences Feature Extraction Classification Corona virus COVID-19 Machine Learning |
title_short |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
title_full |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
title_fullStr |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
title_full_unstemmed |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
title_sort |
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern |
author |
Murugaiah,Muthulakshmi |
author_facet |
Murugaiah,Muthulakshmi Ganesan,Murugeswari |
author_role |
author |
author2 |
Ganesan,Murugeswari |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Murugaiah,Muthulakshmi Ganesan,Murugeswari |
dc.subject.por.fl_str_mv |
Genome Sequences Feature Extraction Classification Corona virus COVID-19 Machine Learning |
topic |
Genome Sequences Feature Extraction Classification Corona virus COVID-19 Machine Learning |
description |
Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/1678-4324-2021210075 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Instituto de Tecnologia do Paraná - Tecpar |
publisher.none.fl_str_mv |
Instituto de Tecnologia do Paraná - Tecpar |
dc.source.none.fl_str_mv |
Brazilian Archives of Biology and Technology v.64 2021 reponame:Brazilian Archives of Biology and Technology instname:Instituto de Tecnologia do Paraná (Tecpar) instacron:TECPAR |
instname_str |
Instituto de Tecnologia do Paraná (Tecpar) |
instacron_str |
TECPAR |
institution |
TECPAR |
reponame_str |
Brazilian Archives of Biology and Technology |
collection |
Brazilian Archives of Biology and Technology |
repository.name.fl_str_mv |
Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar) |
repository.mail.fl_str_mv |
babt@tecpar.br||babt@tecpar.br |
_version_ |
1750318280579481600 |