Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Brazilian Archives of Biology and Technology |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604 |
Resumo: | Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance. |
id |
TECPAR-1_652437a18e839000fa0b86ff84c0d9e0 |
---|---|
oai_identifier_str |
oai:scielo:S1516-89132023000100604 |
network_acronym_str |
TECPAR-1 |
network_name_str |
Brazilian Archives of Biology and Technology |
repository_id_str |
|
spelling |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing EnvironmentEpigenomicsBig dataWorkflow schedulingHeterogeneous computingMakespan minimization.Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance.Instituto de Tecnologia do Paraná - Tecpar2023-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604Brazilian Archives of Biology and Technology v.66 2023reponame:Brazilian Archives of Biology and Technologyinstname:Instituto de Tecnologia do Paraná (Tecpar)instacron:TECPAR10.1590/1678-4324-2023210795info:eu-repo/semantics/openAccessAhmad,WakarAlam,BashirSharma,SwatiKushwaha,Arvindaeng2022-10-27T00:00:00Zoai:scielo:S1516-89132023000100604Revistahttps://www.scielo.br/j/babt/https://old.scielo.br/oai/scielo-oai.phpbabt@tecpar.br||babt@tecpar.br1678-43241516-8913opendoar:2022-10-27T00:00Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar)false |
dc.title.none.fl_str_mv |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
title |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
spellingShingle |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment Ahmad,Wakar Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization. |
title_short |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
title_full |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
title_fullStr |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
title_full_unstemmed |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
title_sort |
Epigenomics Scientific Big Data Workflow Scheduling for Cancer Diagnosis in Health Care Using Heterogeneous Computing Environment |
author |
Ahmad,Wakar |
author_facet |
Ahmad,Wakar Alam,Bashir Sharma,Swati Kushwaha,Arvinda |
author_role |
author |
author2 |
Alam,Bashir Sharma,Swati Kushwaha,Arvinda |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Ahmad,Wakar Alam,Bashir Sharma,Swati Kushwaha,Arvinda |
dc.subject.por.fl_str_mv |
Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization. |
topic |
Epigenomics Big data Workflow scheduling Heterogeneous computing Makespan minimization. |
description |
Abstract DNA methylation and Histone are the main constituents to oversee the stable maintenance of cellular phenotypes. Any abnormalities in these components could cause cancer development and, therefore, must be potentially diagnostic. The Epigenomics research field is the study of epigenetic modification which involves gene expression control for better understanding of human biology. The Epigenomics applications are considered quite complex Big Data workflow applications which represent the data processing pipeline for automating the innumerable genome sequencing computation. The infrastructure of high-performance computing imparts heterogeneous computing resources for deploying such complex applications. Scheduling of workflow applications in the complex heterogeneous computing resources is considered an NP-complete problem; therefore, it requires an efficient scheduling approach. In this research work, an efficient list-based scheduling algorithm is proposed which efficiently minimizes the running time (makespan) of the Epigenomics application. In order to identify whether clustering and entry task duplication techniques improve the performance of the proposed algorithm, four versions of the algorithm such as list-based scheduling with clustering and duplication (LS-C-D), list-based scheduling with clustering and without duplication (LS-C-WD), list-based scheduling without clustering and with duplication (LS-WC-D), and list-based scheduling without clustering and without duplication (LS-WC-WD) has experimented. The experimental results prove that LS-WC-D is the best choice for scheduling Epigenomics applications. Further, the comparison of LS-WC-D and state-of-the-art algorithms also proves its significance. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-01-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-89132023000100604 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/1678-4324-2023210795 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Instituto de Tecnologia do Paraná - Tecpar |
publisher.none.fl_str_mv |
Instituto de Tecnologia do Paraná - Tecpar |
dc.source.none.fl_str_mv |
Brazilian Archives of Biology and Technology v.66 2023 reponame:Brazilian Archives of Biology and Technology instname:Instituto de Tecnologia do Paraná (Tecpar) instacron:TECPAR |
instname_str |
Instituto de Tecnologia do Paraná (Tecpar) |
instacron_str |
TECPAR |
institution |
TECPAR |
reponame_str |
Brazilian Archives of Biology and Technology |
collection |
Brazilian Archives of Biology and Technology |
repository.name.fl_str_mv |
Brazilian Archives of Biology and Technology - Instituto de Tecnologia do Paraná (Tecpar) |
repository.mail.fl_str_mv |
babt@tecpar.br||babt@tecpar.br |
_version_ |
1750318281752838144 |