Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/3/3140/tde-05092022-084236/ |
Resumo: | Genetic sequencing can provide crucial information in medicine and in biology studies. The technologies developed in the field are advancing rapidly and the current third-generation of genome sequencers have significant improvements over the secondgeneration. In parallel to that, sequencing throughput has been increasing at an exponential rate, which, coupled with price reduction, has resulted in a leap of generation of genomic data to be processed. Transistor technology is reaching its fundamental limits, and Moores Law is becoming obsolete, so other alternatives are required to efficiently process such an amount of data. Long-reads from the third generation of sequencers are shown to be an emerging type of genetic data, with average lengths of thousands of nucleotides each. State-of-the-Art algorithm Minimap2 is able to assemble these reads into the genome that was sampled, but it is a computationally-intensive process: for the human genome size with sufficient coverage, running times can reach up to dozens of CPU hours. Hardware acceleration has been proposed as an effort to make Minimap2 more efficient, but up to the present moment, only one of its main bottlenecks, the chaining step, has been successfully accelerated on FPGA. No efficient solution has been proposed for the aligning step, implemented as the ksw function. GACT-X is a Cloud FPGA design that performs a banded SWG alignment with fixed memory, suitable for any size of input. GACT-X with tiles of size 4,000 can be 2x faster than ksw when aligning long sequences. Replacing the alignment function ksw in Minimap2 with GACT-X on a Cloud hybrid system can provide up to 1.41x acceleration on the entire execution to the software counterpart, with comparable accuracy for data that have high similarity to the reference genome. This dissertation presents all the relevant background information, the development stages and methods, the results achieved on three different datasets, and the proposed future work on this acceleration project. |
id |
USP_b720fe3ef6f95677640a5cc28bb47a44 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-05092022-084236 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine.Acelerando a etapa de alinhamento do algoritmo de montagem de genoma Minimap2 usando GACT-X em uma máquina FPGA comercial na nuvem.AccelerationAlgorítmosBioinformáticaCircuitos FPGACloud computingCo-processorsComputação em nuvemField programmable gate arraysGenômicaGenomicsMinimap2Smith- waterman-gotohGenetic sequencing can provide crucial information in medicine and in biology studies. The technologies developed in the field are advancing rapidly and the current third-generation of genome sequencers have significant improvements over the secondgeneration. In parallel to that, sequencing throughput has been increasing at an exponential rate, which, coupled with price reduction, has resulted in a leap of generation of genomic data to be processed. Transistor technology is reaching its fundamental limits, and Moores Law is becoming obsolete, so other alternatives are required to efficiently process such an amount of data. Long-reads from the third generation of sequencers are shown to be an emerging type of genetic data, with average lengths of thousands of nucleotides each. State-of-the-Art algorithm Minimap2 is able to assemble these reads into the genome that was sampled, but it is a computationally-intensive process: for the human genome size with sufficient coverage, running times can reach up to dozens of CPU hours. Hardware acceleration has been proposed as an effort to make Minimap2 more efficient, but up to the present moment, only one of its main bottlenecks, the chaining step, has been successfully accelerated on FPGA. No efficient solution has been proposed for the aligning step, implemented as the ksw function. GACT-X is a Cloud FPGA design that performs a banded SWG alignment with fixed memory, suitable for any size of input. GACT-X with tiles of size 4,000 can be 2x faster than ksw when aligning long sequences. Replacing the alignment function ksw in Minimap2 with GACT-X on a Cloud hybrid system can provide up to 1.41x acceleration on the entire execution to the software counterpart, with comparable accuracy for data that have high similarity to the reference genome. This dissertation presents all the relevant background information, the development stages and methods, the results achieved on three different datasets, and the proposed future work on this acceleration project.O sequenciamento genético pode fornecer informações cruciais em medicina e em estudos de biologia. As tecnologias desenvolvidas na área estão avançando rapidamente e a atual terceira-geração de sequenciadores de genoma possuem melhorias significantes sobre a segunda-geração. Paralelamente a isso, a taxa de sequenciamento vem aumentando exponencialmente, o que, aliado à redução de preços, resultou em um salto de geração de dados genômicos a serem processados. A tecnologia de transistores está atingindo seus limites fundamentais, e a Lei de Moore está se tornando obsoleta, então outras alternativas são necessárias para processar tal quantidade de dados. Long-reads da terceira geração de sequenciadores são um tipo emergente de dados genéticos, com comprimentos médios de milhares de nucleotídeos cada. O algoritmo do Estado-da-Arte Minimap2 é capaz de montar essas reads de volta ao genoma que foi amostrado, mas é um processo computacionalmente intensivo: para o tamanho do genoma humano com cobertura suficiente, os tempos de execução podem chegar a dezenas de horas de CPU. Aceleração em hardware foi proposta como uma aplicação para tornar o Minimap2 mais eficiente, mas até o presente momento, apenas um de seus principais gargalos, a etapa de chaining, foi acelerada com sucesso em FPGA. Nenhuma solução eficiente foi proposta para a etapa de alinhamento, implementada como a função ksw. O GACT-X ´e um design de FPGA em nuvem que executa o alinhamento de SWG em banda, com consumo de memória fixo, adequado para qualquer tamanho de entrada. O GACT-X com tiles de tamanho 4.000 pode ser 2x mais rápido que o ksw ao alinhar sequencias longas. Substituir a função de alinhamento ksw no Minimap2 pelo GACT-X em um sistema híbrido na nuvem pode proporcionar aceleração de até 1,41x sobre toda a execução do software, com precisão comparável para dados que tem alta similaridade com o genoma de referencia. Esta dissertação apresenta todas as informações básicas relevantes, as etapas e os métodos desenvolvimento, os resultados alcançados em três conjuntos de dados diferentes e os trabalhos futuros propostos para este projeto de aceleração.Biblioteca Digitais de Teses e Dissertações da USPFonseca, Fernando JosepettiTeng, Carolina2022-07-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/3/3140/tde-05092022-084236/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2022-09-06T11:14:07Zoai:teses.usp.br:tde-05092022-084236Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212022-09-06T11:14:07Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. Acelerando a etapa de alinhamento do algoritmo de montagem de genoma Minimap2 usando GACT-X em uma máquina FPGA comercial na nuvem. |
title |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
spellingShingle |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. Teng, Carolina Acceleration Algorítmos Bioinformática Circuitos FPGA Cloud computing Co-processors Computação em nuvem Field programmable gate arrays Genômica Genomics Minimap2 Smith- waterman-gotoh |
title_short |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
title_full |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
title_fullStr |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
title_full_unstemmed |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
title_sort |
Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine. |
author |
Teng, Carolina |
author_facet |
Teng, Carolina |
author_role |
author |
dc.contributor.none.fl_str_mv |
Fonseca, Fernando Josepetti |
dc.contributor.author.fl_str_mv |
Teng, Carolina |
dc.subject.por.fl_str_mv |
Acceleration Algorítmos Bioinformática Circuitos FPGA Cloud computing Co-processors Computação em nuvem Field programmable gate arrays Genômica Genomics Minimap2 Smith- waterman-gotoh |
topic |
Acceleration Algorítmos Bioinformática Circuitos FPGA Cloud computing Co-processors Computação em nuvem Field programmable gate arrays Genômica Genomics Minimap2 Smith- waterman-gotoh |
description |
Genetic sequencing can provide crucial information in medicine and in biology studies. The technologies developed in the field are advancing rapidly and the current third-generation of genome sequencers have significant improvements over the secondgeneration. In parallel to that, sequencing throughput has been increasing at an exponential rate, which, coupled with price reduction, has resulted in a leap of generation of genomic data to be processed. Transistor technology is reaching its fundamental limits, and Moores Law is becoming obsolete, so other alternatives are required to efficiently process such an amount of data. Long-reads from the third generation of sequencers are shown to be an emerging type of genetic data, with average lengths of thousands of nucleotides each. State-of-the-Art algorithm Minimap2 is able to assemble these reads into the genome that was sampled, but it is a computationally-intensive process: for the human genome size with sufficient coverage, running times can reach up to dozens of CPU hours. Hardware acceleration has been proposed as an effort to make Minimap2 more efficient, but up to the present moment, only one of its main bottlenecks, the chaining step, has been successfully accelerated on FPGA. No efficient solution has been proposed for the aligning step, implemented as the ksw function. GACT-X is a Cloud FPGA design that performs a banded SWG alignment with fixed memory, suitable for any size of input. GACT-X with tiles of size 4,000 can be 2x faster than ksw when aligning long sequences. Replacing the alignment function ksw in Minimap2 with GACT-X on a Cloud hybrid system can provide up to 1.41x acceleration on the entire execution to the software counterpart, with comparable accuracy for data that have high similarity to the reference genome. This dissertation presents all the relevant background information, the development stages and methods, the results achieved on three different datasets, and the proposed future work on this acceleration project. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-07-27 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/3/3140/tde-05092022-084236/ |
url |
https://www.teses.usp.br/teses/disponiveis/3/3140/tde-05092022-084236/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1809090419014762496 |