Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Ahmad,Wakar; Alam,Bashir; Sharma,Swati; Kushwaha,Arvinda

Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Detalhes bibliográficos
Autor(a) principal:	Ahmad,Wakar
Data de Publicação:	2023
Outros Autores:	Alam,Bashir, Sharma,Swati, Kushwaha,Arvinda
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Brazilian Archives of Biology and Technology
Texto Completo:	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
Resumo:	Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.

Metadados do item

id	TECPAR-1_652437a18e839000fa0b86ff84c0d9e0
oai_identifier_str	oai:scielo:S1516-89132023000100604
network_acronym_str	TECPAR-1
network_name_str	Brazilian Archives of Biology and Technology
repository_id_str
spelling	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing EnvironmentEpigenomicsBig dataWorkflow schedulingHeterogeneous computingMakespan minimization.Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.Instituto de Tecnologia do Paraná - Tecpar2023-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604Brazilian Archives of Biology and Technology v.66 2023reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2023210795info:eu-repo/semantics/openAccessAhmad,WakarAlam,BashirSharma,SwatiKushwaha,Arvindaeng2022-10-27T00:00:00Zoai:scielo:S1516-89132023000100604Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br\|\|babt@tecpar.br1678-43241516-8913opendoar:2022-10-27T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false
dc.title.none.fl_str_mv	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
spellingShingle	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment Ahmad,Wakar Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization.
title_short	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_full	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_fullStr	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_full_unstemmed	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
title_sort	Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
author	Ahmad,Wakar
author_facet	Ahmad,Wakar Alam,Bashir Sharma,Swati Kushwaha,Arvinda
author_role	author
author2	Alam,Bashir Sharma,Swati Kushwaha,Arvinda
author2_role	author author author
dc.contributor.author.fl_str_mv	Ahmad,Wakar Alam,Bashir Sharma,Swati Kushwaha,Arvinda
dc.subject.por.fl_str_mv	Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization.
topic	Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization.
description	Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.
publishDate	2023
dc.date.none.fl_str_mv	2023-01-01
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
url	http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.1590/1678-4324-2023210795
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	text/html
dc.publisher.none.fl_str_mv	Instituto de Tecnologia do Paraná - Tecpar
publisher.none.fl_str_mv	Instituto de Tecnologia do Paraná - Tecpar
dc.source.none.fl_str_mv	Brazilian Archives of Biology and Technology v.66 2023 reponame:Brazilian Archives of Biology and Technology instname:Instituto de Tecnologia do Paraná (Tecpar) instacron:TECPAR
instname_str	Instituto de Tecnologia do Paraná (Tecpar)
instacron_str	TECPAR
institution	TECPAR
reponame_str	Brazilian Archives of Biology and Technology
collection	Brazilian Archives of Biology and Technology
repository.name.fl_str_mv	Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)
repository.mail.fl_str_mv	babt@tecpar.br\|\|babt@tecpar.br
_version_	1750318281752838144

Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment

Registros relacionados