Designing the Silicon Security Lifecycle: DEV, PROD, RMA, and EOL State Machine

zeroRISC Engineering · April 28, 2025 · 10 min read

The lifecycle state machine is the mechanism by which a silicon device knows what it is allowed to do and who is allowed to tell it to do things. Get it right, and the same hardware IP block serves pre-silicon verification engineers with full debug access, manufacturing test engineers with OTP programming capability, and end customers with none of the above. Get it wrong, and field devices can remain debuggable — or worse, remain stuck in a state where an attacker can abuse manufacturing-mode capabilities to extract root key material.

This article explains the OpenTitan lifecycle state machine design: the states it defines, how transitions between states are authenticated, how lifecycle state is persisted in OTP, and what the failure modes look like when lifecycle implementation is done incorrectly. The context is the lc_ctrl IP block as implemented in zeroRISC's root-of-trust IP core.

The Lifecycle State Enumeration

The OpenTitan specification defines a set of named lifecycle states. Each state corresponds to a distinct capability profile — what debug interfaces are active, what provisioning operations are permitted, and what cryptographic key material is accessible. The canonical state sequence from manufacture to end-of-life is:


  RAW
   |  (transition: initial test unlock token)
   v
  TEST_UNLOCKED_0 ... TEST_UNLOCKED_7
   |  (transition: test exit token)
   v
  PROD / PROD_END / DEV
   |  (PROD_END: no further transitions, final state for shipped devices)
   |  (PROD: standard production; RMA possible)
   |  (DEV: development units; elevated debug access)
   |
   v (from PROD only)
  RMA
   |
   v
  SCRAP (EOL, irreversible)

Figure 1: OpenTitan lifecycle state transition graph (simplified). Each arrow represents a transition gated by an HMAC-authenticated token. State is persisted in OTP; transitions are irreversible within a lifecycle branch. RAW is the factory-fresh state before any OTP programming.

The RAW state is the factory-fresh condition: no OTP has been programmed, and the device has minimal security context. A device in RAW state can boot, but the lifecycle controller exposes no security services and no key manager outputs — there is no key material provisioned yet, so there is nothing to protect. RAW state is the pre-provisioning state, and no field device should ever be in RAW state.

TEST_UNLOCKED states (there are multiple, indexed 0 through 7) enable manufacturing test operations: JTAG access is open, scan chains are accessible, and OTP programming (including UDS provisioning) is permitted. The multiple TEST_UNLOCKED states correspond to intermediate test steps in a manufacturing flow that may require transitioning between locked and unlocked conditions across multiple test stations. Each TEST_UNLOCKED → TEST_LOCKED transition programs a specific OTP region; returning to an unlocked state requires the correct unlock token.

PROD is the intended state for shipped devices. In PROD state, debug interfaces are locked, scan chain access is disabled, and only lifecycle-authenticated RMA requests can transition the device further. The cryptographic services are fully operational, and the key manager can execute the full CDI derivation ladder.

PROD_END is a one-way terminal state for products that should never be returned to RMA. Once a device transitions to PROD_END, no further lifecycle transitions are possible, including RMA. This state is appropriate for disposable or high-security devices where the manufacturer does not want to accept returned units that could have their key material interrogated in a repair facility.

DEV is a production-time variant that retains some debug access — specifically, JTAG can be unlocked with a per-device DEV unlock token. DEV state is appropriate for engineering samples and developer units shipped to customers who need debug access. A critical operational requirement: devices manufactured for consumer or field deployment must be programmed to PROD, not DEV. Shipping DEV-state devices to non-developer customers is one of the most common lifecycle security failures in silicon production programs.

RMA (Return Merchandise Authorization) state allows a returned device to be re-provisioned or analyzed in a controlled environment. Transitioning to RMA erases the owner's key material from the device — specifically, the owner-tier OTP partitions are cleared — and restores a subset of debug access for fault analysis. The RMA transition is authenticated by a device-specific RMA token that is only available to the original device manufacturer, preventing an adversary from using a stolen RMA token to unlock arbitrary field devices.

OTP Persistence: Why It Cannot Be Undone

Lifecycle state is persisted in OTP (one-time-programmable) memory, not in rewritable flash or SRAM. This is the architectural decision that makes lifecycle state transitions irreversible: OTP fuse cells physically change state when programmed and cannot be returned to their unprogrammed state by any software or electrical means (without physical destruction of the fuse material, which is a separate threat addressed by tamper detection).

In the OpenTitan OTP controller, the lifecycle state partition uses an encoding that is checked for integrity on each boot. The state encoding uses error-correcting codes (specifically a variant of ECC) over the OTP word, so single-bit errors in the OTP read — which can occur due to aging, radiation effects, or deliberate fault injection — are detected and corrected. A state that cannot be cleanly decoded due to uncorrectable OTP errors causes the lifecycle controller to escalate to an alert, preventing the device from booting into an undefined security state.

The OTP macro is foundry-supplied. The choice of OTP technology affects the threat model in several ways. Anti-fuse OTP (e.g., TSMC MTP/OTP, GF One-Time-Programmable) provides physical one-time-write semantics and is difficult to interrogate non-destructively. Fuse-based OTP in some process nodes can be optically imaged to read fuse states if the package is decapped. OEMs should evaluate the optical visibility of their OTP technology as part of the physical security analysis for their target process node.

Token Authentication: HMAC Over Challenge-Response

Lifecycle state transitions are authenticated by tokens. Each transition type has a corresponding expected token value that is programmed into the OTP at provisioning. To transition the lifecycle state, a transition command must include the correct token for that transition type. The lc_ctrl hardware verifies the provided token against the OTP-stored expected value using a constant-time comparison; a timing difference between correct and incorrect token comparisons would constitute a timing side-channel exploitable to brute-force the token.

For higher-security transitions — specifically, the PROD → RMA transition — the token is device-specific: it is derived from the device UDS and a manufacturer-held root key, so that the correct RMA token for device A cannot be used to initiate RMA on device B. This per-device token derivation ensures that a token database leak at the manufacturer does not simultaneously expose all field devices to unauthorized RMA transitions; an attacker would need to compromise both the token database and the per-device identity to construct a valid per-device token.

The TEST_UNLOCKED tokens, used during manufacturing, are typically shared across a production lot — not per-device — because manufacturing test operations precede UDS provisioning, so there is no per-device identity yet available for token derivation. This is a calculated trade-off: TEST_UNLOCKED tokens should be treated as provisioning secrets, rotated between production lots, and protected in the same manner as other manufacturing key material.

Common Lifecycle Implementation Errors

Several lifecycle implementation failures appear consistently enough across silicon programs to be worth cataloguing explicitly.

Shipping DEV-state devices. Manufacturing flows that do not enforce a PROD state check at the final test station can ship units in DEV state. DEV-state devices in the field have partially open debug access; an attacker who acquires such a device and knows the DEV unlock token can enable JTAG access and interrogate the device's memory map. The countermeasure is a gate in the manufacturing flow that fails any unit whose lifecycle state readback is not PROD (or PROD_END) at final test.

Incorrect escalation tie-offs. The lifecycle controller drives alert escalation outputs that the SoC must connect to meaningful actions: at minimum, a system reset, and preferably CSP zeroization. In integration designs where the alert escalation outputs are left unconnected or tied to logic that triggers no action, a JTAG-glitch attack that causes the lc_ctrl to enter an alert state produces no observable effect — the device continues operating as if the alert never occurred. The SoC integration checklist must verify that every lifecycle alert escalation path terminates in a functional response.

Lifecycle state read before OTP initialization. If the SoC reset sequence releases the lifecycle controller from reset before the OTP controller has completed its power-on initialization read, the lc_ctrl may read an indeterminate lifecycle state. In some OTP macro implementations, uninitialized fuse cells read as logical 0, which may correspond to a defined lifecycle state (such as RAW) rather than an error. This can cause a PROD-state device to appear to be in RAW state on first power-on, with unpredictable consequences for the boot sequence. The reset sequencing constraints for OTP initialization timing must be verified under PVT worst-case conditions, not only typical-case simulation.

Weak RMA token protection. RMA tokens derived from a shared manufacturer root key with insufficient per-device diversification can be partially predicted if the diversification input is low-entropy or predictable (for example, a sequential device serial number). Token derivation should use the device UDS as the diversification input; if UDS is not available at the point in the manufacturing flow when RMA tokens must be generated, the token generation must be deferred or a different per-device diversification value with adequate entropy must be used.

A Scenario: The Debug-Port Vulnerability That Shipped

Consider a silicon device program — not attributed to any specific company — where the production manufacturing test flow was designed during DEV bringup and never updated before mass production. The final test station checked functional operation (power, clocks, basic bus transactions) but did not include a lifecycle state assertion. The OTP programming station wrote lifecycle state to PROD correctly. However, the test flow was later modified at one of the intermediate test stations to include a diagnostic step that required returning to TEST_UNLOCKED_0 state — and this modification was applied to the production flow without updating the state transition documentation. Units manufactured after that flow modification were shipped in TEST_UNLOCKED_0 state with the JTAG debug port fully open.

The failure mode was not detectable through standard functional testing — the devices operated correctly in all functional respects. It was discovered through a security audit that interrogated the lifecycle state register via JTAG and found an unexpected TEST_UNLOCKED response. The remediation required a manufacturing rework flow to reprogram the lifecycle state on all affected units, at significant cost and schedule impact. The root cause was a change management failure: the security-critical lifecycle state assertion was not included in the manufacturing test plan as a mandatory gate, so process modifications that affected lifecycle state were not subject to security review.

Lifecycle State in the Context of NIST SP 800-193

NIST SP 800-193 (Platform Firmware Resiliency Guidelines) defines the Protect / Detect / Recover framework for platform firmware security. The lifecycle state machine contributes to all three pillars. Protection: PROD state locks debug access and prevents modification of root key material. Detection: alert escalation from lifecycle controller integrity checks surfaces anomalous conditions (OTP read errors, transition authentication failures). Recovery: the RMA state provides a controlled path for key material erasure and re-provisioning without requiring physical die destruction.

We are not claiming that a correct lifecycle state machine implementation alone satisfies SP 800-193 requirements. The firmware resiliency framework encompasses the full firmware stack from root-of-trust through UEFI/BIOS to operating system, and the lifecycle state machine secures only the hardware root-of-trust layer. Firmware updates must still be authenticated using keys that chain to the root-of-trust, and the recovery path must include mechanisms to restore authenticated firmware from a known-good backup — neither of which is implemented by the lifecycle state machine itself. The lifecycle machine is a necessary component of a firmware resiliency architecture, not a sufficient one.

The silicon lifecycle state machine is one of those design elements where correctness is binary: either the state machine behaves as specified under all inputs, including adversarial ones, or it does not. Simulation coverage alone is insufficient for high-confidence verification. Formal property verification of the state machine reachability — specifically, that no transition sequence can reach PROD from a post-PROD state except through the defined RMA path — is the recommended verification methodology for production designs.

View the lifecycle controller specification →