Notes from the Wired

MRFI: An Open Source Multi-Resolution Fault Injection Framework for Neural Network Processing

May 11, 2026 | 617 words | 3min read

Paper Title: MRFI: An Open Source Multi-Resolution Fault Injection Framework for Neural Network Processing

Link to Paper: https://arxiv.org/abs/2306.11758

Date: 12. Dec. 2023

Paper Type: Neural Network Reliability, Fault Simulation, Fault Injection

Short Abstract: MRFI is a highly configurable, multi-resolution fault injection tool for deep neural networks, designed to address the limitations of existing fault injection solutions.

1. Introduction

Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications (e.g., autonomous driving, avionics) and large-scale systems (e.g., LLMs), where hardware faults—such as process variations, defects, noise, or bit flips—can cause erroneous inferences with potentially severe consequences. Therefore, thorough reliability evaluation through fault injection before deployment is essential.

Existing fault injection tools (e.g., PyTorchFI, Ares, TensorFI, etc.) provide basic error injection and reliability analysis, but suffer from several key limitations:

MRFI (Multi-Resolution Fault Injection) is a highly configurable, open-source fault injection framework designed to overcome these issues.

Key Advantages:

2. Methedology: MRFI Framework

2.1. MRFI Overview

MRFI consists of two main parts: Configuration and Execution Modules.

  1. Configuration Approaches:
  1. Execution Modules: Provide common fault injection functions (with support for user customization).

2.2. Integration with PyTorch

MRFI is built on top of PyTorch and uses PyTorch hooks for error injection. This approach offers several advantages:

2.3. Major MRFI Execution Modules

MRFI has four key components:

  1. Quantizer (Optional): Handles simulation of quantized (fixed-point/integer) inference. Converts floating-point tensors to integer and back, allowing realistic evaluation of quantized models.
  2. Selector: Determines where errors are injected. Supports:
    • Random positions (for soft errors with given error rates)
    • Fixed positions (for permanent faults)
    • Custom masks or user-defined selectors for selective fault tolerance.
  3. Error Model: Defines how errors are applied at selected positions. Includes:
    • Bit flips (random or fixed)
    • Additive noise (e.g., Gaussian)
    • Stuck-at faults, fixed-value faults, random values, etc.
    • Fully customizable.
  4. Observer: Monitors internal neuron values and error effects without necessarily injecting faults. Useful for:
    • Analyzing activation distributions (e.g., dynamic range for quantization)
    • Golden-run comparisons
    • Measuring error propagation (count of affected neurons, RMSE, etc.)

3. Results

It works, shrimple as!


Stuff

Email Icon reply via email