A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures

Pereira, Mônica Magalhães

A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures

Detalhes bibliográficos
Autor(a) principal:	Pereira, Mônica Magalhães
Data de Publicação:	2012
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UFRGS
Texto Completo:	http://hdl.handle.net/10183/49068
Resumo:	As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place.

Metadados do item

id	URGS_72c75919081b37cc6236a7efef940008
oai_identifier_str	oai:www.lume.ufrgs.br:10183/49068
network_acronym_str	URGS
network_name_str	Biblioteca Digital de Teses e Dissertações da UFRGS
repository_id_str	1853
spelling	Pereira, Mônica MagalhãesCarro, Luigi2012-05-22T01:35:11Z2012http://hdl.handle.net/10183/49068000835126As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place.application/pdfengTolerancia : FalhasMicroeletrônicaCmosReconfigurable architecturesFault toleranceReliability analysisScalingA reliability analysis approach to assist the design of aggressively scaled reconfigurable architecturesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2012doutoradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSORIGINAL000835126.pdf000835126.pdfTexto completo (inglês)application/pdf3665447http://www.lume.ufrgs.br/bitstream/10183/49068/1/000835126.pdfb4d5912c972b2a081ccab0f8333cc304MD51TEXT000835126.pdf.txt000835126.pdf.txtExtracted Texttext/plain273084http://www.lume.ufrgs.br/bitstream/10183/49068/2/000835126.pdf.txt9ce0f8da58829ecd4f2f121e5adc060cMD52THUMBNAIL000835126.pdf.jpg000835126.pdf.jpgGenerated Thumbnailimage/jpeg1062http://www.lume.ufrgs.br/bitstream/10183/49068/3/000835126.pdf.jpg7cc58c643514be78ff8e4cd54b5022ffMD5310183/490682021-05-07 05:04:20.205066oai:www.lume.ufrgs.br:10183/49068Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br\|\|lume@ufrgs.bropendoar:18532021-05-07T08:04:20Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
title	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
spellingShingle	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures Pereira, Mônica Magalhães Tolerancia : Falhas Microeletrônica Cmos Reconfigurable architectures Fault tolerance Reliability analysis Scaling
title_short	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
title_full	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
title_fullStr	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
title_full_unstemmed	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
title_sort	A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures
author	Pereira, Mônica Magalhães
author_facet	Pereira, Mônica Magalhães
author_role	author
dc.contributor.author.fl_str_mv	Pereira, Mônica Magalhães
dc.contributor.advisor1.fl_str_mv	Carro, Luigi
contributor_str_mv	Carro, Luigi
dc.subject.por.fl_str_mv	Tolerancia : Falhas Microeletrônica Cmos
topic	Tolerancia : Falhas Microeletrônica Cmos Reconfigurable architectures Fault tolerance Reliability analysis Scaling
dc.subject.eng.fl_str_mv	Reconfigurable architectures Fault tolerance Reliability analysis Scaling
description	As computer systems are built with aggressively scaled and unreliable technologies, some implementations rely on function specialization with reconfigurable computing to increase performance by exploiting parallelism, with possible energy gains. However, the use of reconfigurable devices in general purpose computing also brings extra reliability challenges at the system level. Solutions to cope with that are generally accompanied with the addition of excessive area, performance and power overheads to the overall system. These overheads could be reduced if a more extensive analysis was performed to evaluate the best fault tolerance strategy to balance the tradeoff between reliability and the mentioned aspects. In this context, this work present a comprehensive analysis of architectural design that includes the use of reliability modeling and takes into consideration aspects such as area, performance, and power. The analysis aims to assist the design of reliability-aware reconfigurable architectures by giving some indications about what kind of redundancy should be used in order to increase reliability. In the proposed analysis, we show that communication among functional units is critical to the overall reliability of reconfigurable architectures. Therefore, where most of the reliability investments should be made. Moreover, the analysis also demonstrate that there is a threshold in the amount of redundancy that can be added in order to increase reliability. This limit is determined by the fact that adding redundancy increases area overhead. This overhead influences reliability until overcomes the reliability gains. Therefore, even disregarding area cost, the gains in reliability will cease or even decrease. To provide a more extended evaluation, a fault tolerance approach was proposed to cope with permanent faults. The LOwER-FaT strategy is a mechanism embedded in a run-time reconfiguration mechanism that automatically selects the fault-free resources without adding extra time overhead to the configuration generation mechanism. The fault-tolerant strategy takes advantage of the on-line transparent configuration generation mechanism to transparently avoid faulty functional units and interconnects. Moreover, the strategy does not require the addition of spare resources. All the resources are used to accelerate execution, and only in case of fault, a resource is replaced by a working one, with a performance penalty caused by the reduction in the amount of resources. In spite of that, experimental results showed a mean performance degradation of 14% on overall performance under 20% fault rate. Moreover, reliability results indicated gains of around six orders of magnitude when the fault tolerance strategy was in place.
publishDate	2012
dc.date.accessioned.fl_str_mv	2012-05-22T01:35:11Z
dc.date.issued.fl_str_mv	2012
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/49068
dc.identifier.nrb.pt_BR.fl_str_mv	000835126
url	http://hdl.handle.net/10183/49068
identifier_str_mv	000835126
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Biblioteca Digital de Teses e Dissertações da UFRGS
collection	Biblioteca Digital de Teses e Dissertações da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/49068/1/000835126.pdf http://www.lume.ufrgs.br/bitstream/10183/49068/2/000835126.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/49068/3/000835126.pdf.jpg
bitstream.checksum.fl_str_mv	b4d5912c972b2a081ccab0f8333cc304 9ce0f8da58829ecd4f2f121e5adc060c 7cc58c643514be78ff8e4cd54b5022ff
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv	lume@ufrgs.br\|\|lume@ufrgs.br
_version_	1810085226706108416

A reliability analysis approach to assist the design of aggressively scaled reconfigurable architectures

Registros relacionados