Notes from the Wired

A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks

May 12, 2026 | 2,876 words | 14min read

Paper Title: A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks

Link to Paper: https://arxiv.org/abs/2305.05750

Date: 9. May 2023

Paper Type: Neural Network Reliability, Fault Simulation, Fault Injection, Literature Review

Short Abstract: This paper reviews methods for assessing the reliability of Deep Neural Networks (DNNs), especially in safety-critical applications where hardware errors can have serious consequences. It presents a systematic literature review categorizing reliability assessment approaches into Fault Injection (FI), Analytical, and Hybrid methods. The study explains the strengths, weaknesses, platforms, and evaluation metrics of these methods, highlighting that while FI is the most common approach, Analytical and Hybrid methods are more lightweight and still accurate, making them promising directions for future DNN reliability research.

1. Introduction

This paper examines the reliability of Deep Neural Networks (DNNs), particularly in safety-critical applications such as autonomous vehicles, where hardware faults can lead to catastrophic failures like incorrect traffic light detection. As DNNs become larger and are deployed on specialized hardware accelerators (GPUs, ASICs, FPGAs, and multi-core processors), ensuring their reliability becomes increasingly important. The paper highlights that different hardware platforms face different fault types, and even small hardware-induced errors can significantly reduce DNN accuracy.

To address this issue, many researchers have proposed methods for assessing and improving DNN reliability. However, because DNNs are used across many applications and platforms, existing methods are fragmented and difficult to generalize. Previous survey papers mainly focused on reliability enhancement or fault injection techniques, but none provided a complete review of reliability assessment methods themselves.

This work presents the first comprehensive Systematic Literature Review (SLR) dedicated to DNN reliability assessment methods. Reviewing studies published between 2017 and 2022.

2. Preliminaries

4. Study Overview

4.1 Taxonomy

Fault Injection (FI) methods evaluate DNN reliability by introducing faults and observing their effects. They are divided into three approaches:

Analytical methods estimate reliability without injecting faults by analyzing the internal structure of DNNs, such as neurons and layer connections, to model how faults affect outputs and identify critical components, offering lower computational cost.

Hybrid methods combine analytical modeling with fault injection to reduce the complexity of extensive FI experiments while maintaining higher realism than purely analytical approaches, enabling more efficient and scalable reliability evaluation.

The paper analyzes 139 studies on DNN reliability (2017–2022) to show research trends. It finds that the topic started in 2017 and has steadily grown, becoming an active research area. Most studies rely on Fault Injection (FI) methods for reliability assessment, while only about 10% use analytical methods (11 papers) or hybrid analytical/FI methods (3 papers).

Among FI-based studies, most use software simulation, especially hardware-independent approaches, while in hardware emulation, GPU-based platforms are the most common. Overall, the statistics show a strong dominance of FI methods and limited use of analytical and hybrid approaches, highlighting potential areas for future research.

5. Characterization

5.1 Fault Injection Method

5.1.1 Fault Simulation

5.1.1.1 Hardware-Independent Platform

General:

Fault modeling:

Injection strategy:

Evaluation methods:

SDC rate-based evaluation:

Software FI tools:

Overall insight:

5.1.1.2 Hardware-Aware Platform:

General:

Fault injection approach:

Permanent fault studies:

Evaluation methods:

Advanced SDC classification (example study):

5.1.1.3 RTL Model Platform:

General:

Fault injection at RTL level:

Evaluation methods:

Overall insight:

5.1.2 Fault Emulation

5.1.2.1 FPGA platform for DNN reliability:

General

Three main FI setups:

Fault modeling:

Frameworks and tools:

Evaluation methods:

Advanced reliability metrics:

Overall insight:

5.1.2.2 GPU Platform (FI in DNNs):

General:

FI frameworks used:

Fault modeling:

Evaluation methods:

Refined SDC classifications:

Vulnerability analysis metrics:

Overall insight:

5.1.2.3 Processors Platform (DNNs on CPUs/edge devices):

General:

Fault injection frameworks:

Evaluation methods:

Reliability metrics:

Overall insight:

5.1.3 Irradiation

See paper for details.

5.2 Analytical Methods

Analytical methods for DNN reliability assessment model reliability mathematically instead of directly injecting faults during evaluation. They analyze the structure and behavior of DNNs, particularly neurons, layers, and weights, to estimate how faults would affect outputs and to determine the vulnerability of different components. The main idea is to link the importance or influence of a component on the output with its susceptibility to faults, thereby identifying critical neurons and layers. Although these methods may still use fault injection for validation, they do not rely on it for modeling.

Four main approaches are used: Layerwise Relevance Propagation (LRP)-based analysis, which assigns contribution scores to neurons and ranks them by vulnerability; gradient-based analysis, which uses gradients of outputs with respect to weights or activations to measure sensitivity and identify critical components; estimation-based analysis, which uses statistical measures such as activation ranges or norms to quickly approximate vulnerability with lower accuracy but higher efficiency; and machine learning-based analysis, which applies methods like Open-Set Recognition to detect abnormal outputs and identify critical faults via thresholds on output logits.

Across all approaches, vulnerability of neurons, feature maps, and weights is the key concept for assessing reliability. These methods are generally lightweight, accelerator-agnostic, and faster than fault injection, though estimation-based methods trade accuracy for speed, while LRP and gradient-based methods provide more accurate results closer to fault injection outcomes.

5.3 Hybrid Methods

See paper for details.

6. Discussion

Key open challenges identified in DNN reliability assessment:

Main future research directions:

Email Icon reply via email