Side-Channel and Fault-Injection Resistance in Commercial Silicon IP

zeroRISC Engineering · February 24, 2025 · 11 min read

Physical attacks against silicon cryptographic modules divide cleanly into two classes: passive side-channel attacks, which observe unintended information leakage from a device without disrupting its operation, and active fault injection attacks, which deliberately perturb device operation to induce exploitable behavior. Both classes are practical against production silicon — not theoretical — and the countermeasures for each require design decisions at RTL level that cannot be retrofitted in synthesis or post-processing.

This article surveys the threat taxonomy, the countermeasure techniques implemented in the zeroRISC root-of-trust IP, and the practical limitations of each approach. The framing is evaluation-grade: what an evaluating laboratory would examine, what a red team with physical access would attempt, and what the design response is for each scenario. Dominic Rizzo's work prior to founding zeroRISC — spanning hardware security research and contributions to OpenTitan's physical security architecture — informs the countermeasure philosophy described here.

Passive Side-Channel Attack Taxonomy

Passive side-channel attacks exploit physical observables — power consumption, electromagnetic emission, timing, photonic emission — that correlate with secret data processed by the device. The canonical attack classes are:

Simple Power Analysis (SPA). A single-trace analysis that identifies distinguishable power patterns corresponding to different operations or data values. SPA is most effective against asymmetric cryptographic operations (RSA, ECC) where scalar multiplication algorithms have conditionally executed branches that produce visible differences in the power trace. For AES and HMAC implementations, SPA is generally less effective, but conditional branches in key schedule operations or Sbox table lookups can be exploited.

Differential Power Analysis (DPA). A statistical attack using many traces. The attacker collects power traces over many cryptographic operations with varying inputs, then applies statistical correlation between trace features and hypothesized intermediate computation values. DPA targets Hamming weight or Hamming distance leakage models — the well-established observation that CMOS circuit switching activity correlates with the number of bit transitions in processed data. A first-order DPA attack on an unprotected AES implementation typically requires between a few hundred and a few thousand traces to recover a 128-bit key, depending on signal-to-noise ratio.

Higher-Order DPA. When first-order countermeasures (masking) are applied, the attacker moves to second-order or higher-order attacks that target statistical dependencies between multiple samples in the power trace. A d-th order masking scheme is theoretically resistant to d-th order DPA but vulnerable to (d+1)-th order attack. In practice, second-order masking is the minimum bar for production cryptographic implementations targeting FIPS 140-3 Level 3 physical security requirements.

Electromagnetic Analysis (EMA). Conceptually similar to DPA, but using EM probes placed near the device rather than power supply measurements. EM attacks can be spatially localized — a probe positioned over the AES datapath can target that specific region with lower noise than a power-supply-based DPA attack. For chips with metal shielding on the top copper layers, EM attacks require either removal of the shielding or near-field probe positioning that avoids the shield.

Timing attacks. Exploitation of data-dependent execution timing in cryptographic operations. The canonical case is RSA private key recovery via timing variations in the modular exponentiation step. For implementations that use conditional branches or non-constant-time lookup table operations, timing attacks can be effective remotely, without physical access. All symmetric and hash operations in a production RoT must be constant-time across all input values.

Masking: The Primary Software-Level Countermeasure

Masking is a countermeasure that splits each secret data value into d+1 shares, where d is the masking order. Computation operates on the shares rather than the original value; the shares are recombined only at the final output stage. A first-order masked implementation ensures that any single intermediate value is statistically independent of the unmasked secret. A second-order masked implementation ensures that any pair of intermediate values is statistically independent of the secret.

Implementing masking in RTL requires careful attention to the recombination points. A masked AES S-box, for example, must implement the entire S-box operation in the masked domain — sharing must be maintained through the GF(2^8) inversion that constitutes the AES S-box's non-linear core. Naive masking implementations that share only the linear components of AES while computing the S-box inversion unmasked provide no DPA protection at the non-linear layer, which is precisely where DPA attacks focus.

Masking has a gate count and power cost. A second-order masked AES implementation typically requires 3x-5x the gate count of an unmasked implementation. The area penalty is a design constraint, not a theoretical concern — for an IP block targeting area-sensitive consumer IoT SoC designs, the area budget for masking is a first-class design parameter.

The OpenTitan AES implementation uses a threshold implementation (TI) approach to first-order masking, with fresh randomness injected into the mask shares at each clock cycle using the CSRNG. Threshold implementation correctness requires that the number of input shares in each non-linear operation satisfies the threshold property — a mathematical condition that ensures no individual share carries information about the secret. Verifying TI correctness is done through a combination of formal analysis and empirical TVLA (Test Vector Leakage Assessment) testing on silicon.

Active Fault Injection Attack Taxonomy

Fault injection attacks perturb device operation to bypass security checks, corrupt computations, or induce exploitable fault modes. The primary attack mechanisms are:

Voltage glitching. A brief transient voltage perturbation on the power supply, introduced through a crowbar circuit or FET-based glitch injector. A well-timed voltage glitch can cause a single instruction to be skipped (instruction skip fault) or a register write to be corrupted. For a processor executing a security check — for example, a comparison that gates further execution — a single instruction skip can bypass the comparison entirely. Voltage glitching is accessible with low-cost equipment; bench-level voltage glitch injectors capable of sub-nanosecond pulse widths can be built for under $500.

Clock glitching. A spurious clock edge introduced during normal clocking. A double-clock pulse can cause a processor to execute two operations in one cycle, potentially consuming an instruction counter increment without executing the corresponding instruction. Clock glitching requires access to the clock input signal, which on a BGA-packaged part requires decapping or careful PCB-level signal injection.

Electromagnetic Fault Injection (EMFI). A focused electromagnetic pulse delivered via a near-field probe positioned over the target die area. EMFI can induce faults in specific die regions without requiring electrical access to power supply or clock signals. A probe positioned over the processor datapath can induce bit flips in pipeline registers; a probe over the OTP read interface can corrupt the lifecycle state readback. EMFI setups are commercially available from specialized test equipment vendors.

Laser Fault Injection (LFI). A focused laser pulse directed at a specific transistor or metal interconnect on the die surface (or through the package substrate on backside-attack setups). LFI is the highest-precision fault injection technique, capable of targeting individual bits in registers or SRAM cells. LFI requires package decapping to expose the die surface, which is destructive but within the capability of well-equipped hardware security laboratories.

Fault Detection Countermeasures in RTL

The primary countermeasure against fault injection at RTL level is detection-and-response: instrument the design to detect anomalous operating conditions, and respond to detected anomalies with secure escalation before the fault can produce exploitable behavior.

Voltage and clock sensors. On-chip analog voltage comparators and frequency detectors provide first-line detection of voltage glitch and clock glitch attacks. These sensors are not high-precision measurement circuits; they are threshold detectors tuned to alert if VDD or clock frequency falls outside a defined operating range for more than a defined window. Sensor placement and threshold calibration require process-node-specific characterization to avoid false positives under normal operating condition variation.

Redundant computation. Security-critical computations — particularly comparisons and conditional branches that gate key derivation or lifecycle state transitions — are implemented with spatial and temporal redundancy. The primary and redundant computation paths are implemented in physically separated logic cones to reduce the probability that a spatially localized fault (e.g., EMFI or LFI) affects both paths simultaneously. Agreement between the primary and redundant outputs is checked before proceeding; disagreement triggers an alert escalation.

Double-data-rate (DDR) sampling. For state machine transitions that must be atomic, sampling the state register on both rising and falling clock edges — and requiring agreement — provides detection of clock glitch attacks that inject spurious edges. A single spurious clock edge during a transition that DDR sampling would detect as a disagreement triggers the alert handler.

Alert architecture and escalation. The OpenTitan alert handler provides a hardware escalation mechanism with configurable escalation paths. Minor alerts (sensor triggers, redundancy mismatches that may be transient) escalate to interrupt generation. Major alerts escalate to processor NMI, then to lifecycle state lockout, then to system-wide reset with CSP zeroization. The escalation timer ensures that if the processor is faulted and cannot handle an alert, the hardware escalation continues autonomously.

What Countermeasures Do Not Cover

We are not claiming that the countermeasures described above render the device immune to physical attack by a sufficiently motivated adversary with laboratory-grade equipment and unlimited time. The security model for FIPS 140-3 Level 3 physical security is resistance to attacks that require specialized laboratory equipment but do not assume nation-state capabilities or multi-year directed attacks against a specific device.

LFI targeting the OTP read path, with decapped package and backside laser access, is outside the threat model assumed by the physical security countermeasures described here. A Common Criteria evaluation at EAL6+ would require resistance to that attack class; FIPS 140-3 Level 3 does not. OEMs building products for markets where that threat level is relevant should discuss physical security requirements with their security evaluation laboratory before assuming FIPS 140-3 Level 3 is a sufficient bar.

Additionally, the countermeasures above address attacks at the silicon level. They do not address attacks at the system level — for example, cold-boot attacks that extract key material from DRAM after power cycling, or bus-snooping attacks on external memory interfaces. The root-of-trust IP block secures key material within the cryptographic boundary; securing data that traverses external interfaces is the SoC architect's responsibility.

TVLA Testing: Empirical Leakage Assessment

Test Vector Leakage Assessment (TVLA) is the empirical methodology used to evaluate side-channel leakage in cryptographic implementations. The core technique is Welch's t-test applied to power traces collected under two conditions: traces collected with a fixed test vector, and traces collected with uniformly random test vectors. A |t| value above 4.5 in any trace sample point is conventionally interpreted as evidence of detectable leakage.

TVLA testing is required as part of the SP 800-140C conformance demonstration for FIPS 140-3 Level 3. The test must be run on actual silicon — simulation-based leakage analysis is not accepted as evidence of physical security compliance. An IP vendor supplying TVLA results should document the specific test conditions: the silicon sample set size, the oscilloscope and probe setup, the frequency bandwidth of the measurement chain, and the number of traces collected. A TVLA dataset collected at 100 MHz bandwidth on a sample of 5 devices is not comparable to one collected at 1 GHz bandwidth; the bandwidth of the measurement chain determines which leakage mechanisms are visible.

The TVLA pass criterion is a necessary but not sufficient condition for side-channel resistance. A t-test that does not detect leakage at the fixed-random comparison does not rule out leakage under adversarially chosen input conditions or leakage in higher-order statistical relationships. Comprehensive side-channel security evaluation includes specific DPA attacks (not just TVLA) and an explicit threat model documentation that characterizes which attack scenarios have been tested and at what trace count.

View the fault-injection countermeasures specification →