A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern

Detalhes bibliográficos
Autor(a) principal: Murugaiah,Muthulakshmi
Data de Publicação: 2021
Outros Autores: Ganesan,Murugeswari
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Brazilian Archives of Biology and Technology
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429
Resumo: Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome.
id TECPAR-1_bb7dfa1e9135046175610cbed7d773a9
oai_identifier_str oai:scielo:S1516-89132021000100429
network_acronym_str TECPAR-1
network_name_str Brazilian Archives of Biology and Technology
repository_id_str
spelling A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat PatternGenome SequencesFeature ExtractionClassificationCorona virusCOVID-19Machine LearningAbstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome.Instituto de Tecnologia do Paraná - Tecpar2021-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429Brazilian Archives of Biology and Technology v.64 2021reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2021210075info:eu-repo/semantics/openAccessMurugaiah,MuthulakshmiGanesan,Murugeswarieng2022-01-07T00:00:00Zoai:scielo:S1516-89132021000100429Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br||babt@tecpar.br1678-43241516-8913opendoar:2022-01-07T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false
dc.title.none.fl_str_mv A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
title A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
spellingShingle A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
Murugaiah,Muthulakshmi
Genome Sequences
Feature Extraction
Classification
Corona virus
COVID-19
Machine Learning
title_short A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
title_full A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
title_fullStr A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
title_full_unstemmed A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
title_sort A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
author Murugaiah,Muthulakshmi
author_facet Murugaiah,Muthulakshmi
Ganesan,Murugeswari
author_role author
author2 Ganesan,Murugeswari
author2_role author
dc.contributor.author.fl_str_mv Murugaiah,Muthulakshmi
Ganesan,Murugeswari
dc.subject.por.fl_str_mv Genome Sequences
Feature Extraction
Classification
Corona virus
COVID-19
Machine Learning
topic Genome Sequences
Feature Extraction
Classification
Corona virus
COVID-19
Machine Learning
description Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern “TTGTT” is the most repeat pattern in COVID-19 genome.
publishDate 2021
dc.date.none.fl_str_mv 2021-01-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132021000100429
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/1678-4324-2021210075
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Instituto de Tecnologia do Paraná - Tecpar
publisher.none.fl_str_mv Instituto de Tecnologia do Paraná - Tecpar
dc.source.none.fl_str_mv Brazilian Archives of Biology and Technology v.64 2021
reponame:Brazilian Archives of Biology and Technology
instname:Instituto de Tecnologia do Paraná (Tecpar)
instacron:TECPAR
instname_str Instituto de Tecnologia do Paraná (Tecpar)
instacron_str TECPAR
institution TECPAR
reponame_str Brazilian Archives of Biology and Technology
collection Brazilian Archives of Biology and Technology
repository.name.fl_str_mv Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)
repository.mail.fl_str_mv babt@tecpar.br||babt@tecpar.br
_version_ 1750318280579481600