A Survey of fault mitigation techniques for multi-core architectures

by   Shashikiran Venkatesha, et al.

Fault tolerance in multi-core architecture has attracted attention of research community for the past 20 years. Rapid improvements in the CMOS technology resulted in exponential growth of transistor density. It resulted in increased challenges for designing resilient multi-core architecture at the same pace. The article presents a survey of fault tolerant methods like fault detection, recovery, re-configurability and repair techniques for multi-core architectures. Salvaging at micro-architectural and architectural level are also discussed. Gamut of fault tolerant approaches discussed in this article have tangible improvements on the reliability of the multi-core architectures. Every concept in the seminal articles is examined with respect to relevant metrics like performance cost, area overhead, fault coverage, level of protection, detection latency and Mean Time To Failure. The existing literature is critically examined. New research directions in the form of new fault tolerant design alternatives for both homogeneous and heterogeneous multi-core architectures are presented. Brief on an analytical approach for fault tolerating model is suggested for Intel and AMD based modern homogeneous multi-core architecture are presented to enhance the understanding of the readers about the architecture with respect to performance degradation, memory access time and execution time.


page 1

page 14


A Survey of fault models and fault tolerance methods for 2D bus-based multi-core systems and TSV based 3D NOC many-core systems

Reliability has taken centre stage in the development of high-performanc...

Enhancement in Reliability for Multi-core system consisting of One Instruction Cores

Rapid CMOS device size reduction resulted in billions of transistors on ...

Actuator Fault-Tolerant Vehicle Motion Control: A Survey

The advent of automated vehicles operating at SAE levels 4 and 5 poses h...

Fault Tolerant Processing Unit Using Gamma Distribution Sliding Window For Autonomous Landing Guidance System

To keep up with today's dense metropolitan areas and their accompanying ...

Design of optical neural networks with component imprecisions

For the benefit of designing scalable, fault resistant optical neural ne...

Analysis of Fault Tolerant Multi-stage Switch Architecture for TSN

We conducted the feasibility analysis of utilizing a highly available mu...

On-Demand Redundancy Grouping: Selectable Soft-Error Tolerance for a Multicore Cluster

With the shrinking of technology nodes and the use of parallel processor...

Please sign up or login with your details

Forgot password? Click here to reset