Using precision reduction to efficiently improve mixed-precision GPUs reliability
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/224241 |
Resumo: | Duplication With Comparison (DWC) is a traditional and accepted method for improving systems’ reliability. DWC consists of duplicating critical regions in Software or in Hardware level by creating redundant operations in order to decrease the probability of an unwanted event. However, this technique introduces an expensive overhead in power consumption, processing time and in resources allocation. This obstacle is due to the fact that the critical operations are computed at least two times in this process. Reduced Precision Duplication With Comparison (RP-DWC) is an effective software level solution to improve the performance of the conventional DWC. RP-DWC aims to mitigate these overheads by enabling parallel processing in underused Floating Point Units (FPUs) in mixed precision Graphic Processing Units (GPUs). By making use of precision reduction to efficiently improve the reliability in mixed precision GPUs, RPDWC extends the DWC technique, introducing proper ways to handle redundancy with different precision operations. Improving GPUs reliability is an extremely valuable challenge in the fault tolerance field since GPUs are adopted in both High-Performance Computing (HPC) and in automotive real-time applications. When GPUs are exposed to a natural environment, such as the surface of the Earth at sea level, they are also exposed to the Earth’s surface radiation. Furthermore, this exposure can be critical, given that these radiation particles may hit the GPU’s internal circuit, corrupt sensitive data and consequently generate undesired outputs. Introducing duplication with reduced precision in a trustworthy manner to maintain reliability in safety-critical systems is an arduous task that we propose to further investigate in this work. |
id |
UFRGS-2_8366c73b5878d53181ad29e4b2ddce08 |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/224241 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Acosta, Gerônimo VeitRech, PaoloSantos, Fernando Fernandes dos2021-07-21T04:23:43Z2021http://hdl.handle.net/10183/224241001128602Duplication With Comparison (DWC) is a traditional and accepted method for improving systems’ reliability. DWC consists of duplicating critical regions in Software or in Hardware level by creating redundant operations in order to decrease the probability of an unwanted event. However, this technique introduces an expensive overhead in power consumption, processing time and in resources allocation. This obstacle is due to the fact that the critical operations are computed at least two times in this process. Reduced Precision Duplication With Comparison (RP-DWC) is an effective software level solution to improve the performance of the conventional DWC. RP-DWC aims to mitigate these overheads by enabling parallel processing in underused Floating Point Units (FPUs) in mixed precision Graphic Processing Units (GPUs). By making use of precision reduction to efficiently improve the reliability in mixed precision GPUs, RPDWC extends the DWC technique, introducing proper ways to handle redundancy with different precision operations. Improving GPUs reliability is an extremely valuable challenge in the fault tolerance field since GPUs are adopted in both High-Performance Computing (HPC) and in automotive real-time applications. When GPUs are exposed to a natural environment, such as the surface of the Earth at sea level, they are also exposed to the Earth’s surface radiation. Furthermore, this exposure can be critical, given that these radiation particles may hit the GPU’s internal circuit, corrupt sensitive data and consequently generate undesired outputs. Introducing duplication with reduced precision in a trustworthy manner to maintain reliability in safety-critical systems is an arduous task that we propose to further investigate in this work.application/pdfporTolerancia : FalhasReliabilityRadiationDuplicationDWCRP-DWCGPUUsing precision reduction to efficiently improve mixed-precision GPUs reliabilityinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2021Ciência da Computação: Ênfase em Engenharia da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001128602.pdf.txt001128602.pdf.txtExtracted Texttext/plain79487http://www.lume.ufrgs.br/bitstream/10183/224241/2/001128602.pdf.txtc77f24fe3857640be0dcb4aa89dc0906MD52ORIGINAL001128602.pdfTexto completo (inglês)application/pdf1310454http://www.lume.ufrgs.br/bitstream/10183/224241/1/001128602.pdfdbcedef7bb794ab3e2d47471fcadcf30MD5110183/2242412021-08-18 04:33:39.423768oai:www.lume.ufrgs.br:10183/224241Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2021-08-18T07:33:39Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
title |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
spellingShingle |
Using precision reduction to efficiently improve mixed-precision GPUs reliability Acosta, Gerônimo Veit Tolerancia : Falhas Reliability Radiation Duplication DWC RP-DWC GPU |
title_short |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
title_full |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
title_fullStr |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
title_full_unstemmed |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
title_sort |
Using precision reduction to efficiently improve mixed-precision GPUs reliability |
author |
Acosta, Gerônimo Veit |
author_facet |
Acosta, Gerônimo Veit |
author_role |
author |
dc.contributor.author.fl_str_mv |
Acosta, Gerônimo Veit |
dc.contributor.advisor1.fl_str_mv |
Rech, Paolo |
dc.contributor.advisor-co1.fl_str_mv |
Santos, Fernando Fernandes dos |
contributor_str_mv |
Rech, Paolo Santos, Fernando Fernandes dos |
dc.subject.por.fl_str_mv |
Tolerancia : Falhas |
topic |
Tolerancia : Falhas Reliability Radiation Duplication DWC RP-DWC GPU |
dc.subject.eng.fl_str_mv |
Reliability Radiation Duplication DWC RP-DWC GPU |
description |
Duplication With Comparison (DWC) is a traditional and accepted method for improving systems’ reliability. DWC consists of duplicating critical regions in Software or in Hardware level by creating redundant operations in order to decrease the probability of an unwanted event. However, this technique introduces an expensive overhead in power consumption, processing time and in resources allocation. This obstacle is due to the fact that the critical operations are computed at least two times in this process. Reduced Precision Duplication With Comparison (RP-DWC) is an effective software level solution to improve the performance of the conventional DWC. RP-DWC aims to mitigate these overheads by enabling parallel processing in underused Floating Point Units (FPUs) in mixed precision Graphic Processing Units (GPUs). By making use of precision reduction to efficiently improve the reliability in mixed precision GPUs, RPDWC extends the DWC technique, introducing proper ways to handle redundancy with different precision operations. Improving GPUs reliability is an extremely valuable challenge in the fault tolerance field since GPUs are adopted in both High-Performance Computing (HPC) and in automotive real-time applications. When GPUs are exposed to a natural environment, such as the surface of the Earth at sea level, they are also exposed to the Earth’s surface radiation. Furthermore, this exposure can be critical, given that these radiation particles may hit the GPU’s internal circuit, corrupt sensitive data and consequently generate undesired outputs. Introducing duplication with reduced precision in a trustworthy manner to maintain reliability in safety-critical systems is an arduous task that we propose to further investigate in this work. |
publishDate |
2021 |
dc.date.accessioned.fl_str_mv |
2021-07-21T04:23:43Z |
dc.date.issued.fl_str_mv |
2021 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/224241 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001128602 |
url |
http://hdl.handle.net/10183/224241 |
identifier_str_mv |
001128602 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/224241/2/001128602.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/224241/1/001128602.pdf |
bitstream.checksum.fl_str_mv |
c77f24fe3857640be0dcb4aa89dc0906 dbcedef7bb794ab3e2d47471fcadcf30 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1801224609491582976 |