Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Detalhes bibliográficos
Autor(a) principal: Ahmad,Wakar
Data de Publicação: 2023
Outros Autores: Alam,Bashir, Sharma,Swati, Kushwaha,Arvinda
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Brazilian Archives of Biology and Technology
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
Resumo: Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.
id TECPAR-1_652437a18e839000fa0b86ff84c0d9e0
oai_identifier_str oai:scielo:S1516-89132023000100604
network_acronym_str TECPAR-1
network_name_str Brazilian Archives of Biology and Technology
repository_id_str
spelling Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing EnvironmentEpigenomicsBig dataWorkflow schedulingHeterogeneous computingMakespan minimization.Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.Instituto de Tecnologia do Paraná - Tecpar2023-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604Brazilian Archives of Biology and Technology v.66 2023reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2023210795info:eu-repo/semantics/openAccessAhmad,WakarAlam,BashirSharma,SwatiKushwaha,Arvindaeng2022-10-27T00:00:00Zoai:scielo:S1516-89132023000100604Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br||babt@tecpar.br1678-43241516-8913opendoar:2022-10-27T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false
dc.title.none.fl_str_mv Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
spellingShingle Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
Ahmad,Wakar
Epigenomics
Big data
Workflow scheduling
Heterogeneous computing
Makespan minimization.
title_short Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_full Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_fullStr Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_full_unstemmed Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_sort Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
author Ahmad,Wakar
author_facet Ahmad,Wakar
Alam,Bashir
Sharma,Swati
Kushwaha,Arvinda
author_role author
author2 Alam,Bashir
Sharma,Swati
Kushwaha,Arvinda
author2_role author
author
author
dc.contributor.author.fl_str_mv Ahmad,Wakar
Alam,Bashir
Sharma,Swati
Kushwaha,Arvinda
dc.subject.por.fl_str_mv Epigenomics
Big data
Workflow scheduling
Heterogeneous computing
Makespan minimization.
topic Epigenomics
Big data
Workflow scheduling
Heterogeneous computing
Makespan minimization.
description Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.
publishDate 2023
dc.date.none.fl_str_mv 2023-01-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/1678-4324-2023210795
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Instituto de Tecnologia do Paraná - Tecpar
publisher.none.fl_str_mv Instituto de Tecnologia do Paraná - Tecpar
dc.source.none.fl_str_mv Brazilian Archives of Biology and Technology v.66 2023
reponame:Brazilian Archives of Biology and Technology
instname:Instituto de Tecnologia do Paraná (Tecpar)
instacron:TECPAR
instname_str Instituto de Tecnologia do Paraná (Tecpar)
instacron_str TECPAR
institution TECPAR
reponame_str Brazilian Archives of Biology and Technology
collection Brazilian Archives of Biology and Technology
repository.name.fl_str_mv Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)
repository.mail.fl_str_mv babt@tecpar.br||babt@tecpar.br
_version_ 1750318281752838144