Hard-fail vs Soft-fail — Revocation Checking That Won’t Betray You

In the high-stakes game of digital identity, the decision between operational continuity and ironclad security hinges on how your system handles a “silence” from the revocation server.

Hard-fail

The “Guilty Until Proven Innocent” approach. If the revocation server is unreachable, the certificate is treated as revoked. Access is denied. Security is absolute, but availability is fragile.

Soft-fail

The “Optimistic Connection” approach. If the server doesn’t respond within a timeout, the certificate is assumed valid. Availability is prioritized, creating a window for potential exploitation if a certificate was indeed revoked.

Strategic Use Cases

Where the trade-offs define the mission success.

01. Critical IoT & Smart Infrastructure 

In industrial settings, a “Hard-fail” policy on a network of 10,000 sensors could lead to a complete factory shutdown if a single OCSP responder goes offline. Here, Soft-fail paired with aggressive monitoring is often the pragmatic choice to maintain operational uptime.

02. 5G Core & Telco Grids 

High-bandwidth, low-latency 5G slices require instant verification. Hard-fail is often mandated here for inter-carrier trust, but it requires distributed, high-availability revocation caches to prevent massive service outages.

Key Insight

“In 5G, the revocation check is as much about protecting the network from compromised nodes as it is about subscriber privacy.”

03. Autonomous Agentic AI 

When AI agents act as proxies for human users, identity must be absolute. We recommend Hard-fail for agent-to-agent interactions. An agent without a verifiable identity should be considered a rogue actor until proven otherwise.

The Imperative for Immediate Action

Harvest Now, Decrypt Later

Adversaries are currently intercepting and storing encrypted data with the intent of decrypting it once large-scale quantum computers become viable. For IoT and long-lifecycle industrial assets, the data being transmitted today must be protected against future quantum decryption capabilities.

Transition at Scale

Migrating millions of M2M identities isn’t an overnight task. It requires a robust, agile infrastructure that can handle hybrid states. The time to build the “crypto-agile bridge” is years before the RSA/ECC break-point, ensuring seamless rotation across entire fleets.

Post-Quantum Cryptography (PQC)

Cryptographic algorithms designed to be secure against a cryptanalytic attack by a quantum computer. TrustFactory leverages NIST-selected finalists like ML-KEM and ML-DSA to ensure identities remain immutable in the post-quantum era.

Crypto Agility

The ability of a system to rapidly switch between cryptographic primitives (algorithms, key lengths) without significant infrastructure overhaul. It’s about building for change, not just for one standard.

Vertical Resilience

How PQC and Agility manifest in mission-critical environments.

Industrial IoT

Securing sensors with 20-year lifespans that will inevitably face the quantum threat during their deployment.

5G Infrastructure

Protecting massive machine-type communications (mMTC) and network slicing logic against quantum eavesdropping.

Agentic AI

Granting autonomous agents the ability to verify each other’s intent and identity with quantum-secure proofs.

The Sane Rule of Thumb

High Security

Financial transactions, Healthcare records, Admin access.

Use Hard-fail.

High Availability

Consumer web apps, Public WiFi, Smart Home IoT.

Use Soft-fail + Short TTLs.

Hybrid Best Practice

The “OCSP Stapling” approach. Push the status to the client instead of making them pull it.

Use Soft-fail + Short TTLs.

The Architect’s Conclusion

Revocation checking is not a binary switch, but a spectrum of risk management. By choosing Hard-fail, you accept that your infrastructure’s availability is now tied to your CA’s responder health. By choosing Soft-fail, you accept a security “grace period” that an attacker could exploit.

Modern architectures should strive for Must-Staple certificates. This forces the server to provide a fresh OCSP response during the TLS handshake, effectively achieving the security of Hard-fail with the performance and availability benefits of Soft-fail.

At Cumulocrypt, we advocate for transparency: define your failure policy in your security manifest, and never let a timeout be the reason for a breach.

Industrial IoT

Principal Security Architect, Cumulocrypt