Hard-fail vs Soft-fail — Revocation Checking That Won’t Betray You
In the high-stakes game of digital identity, the decision between operational continuity and ironclad security hinges on how your system handles a “silence” from the revocation server.
Hard-fail
The “Guilty Until Proven Innocent” approach. If the revocation server is unreachable, the certificate is treated as revoked. Access is denied. Security is absolute, but availability is fragile.
Soft-fail
The “Optimistic Connection” approach. If the server doesn’t respond within a timeout, the certificate is assumed valid. Availability is prioritized, creating a window for potential exploitation if a certificate was indeed revoked.
Strategic Use Cases
Where the trade-offs define the mission success.
01. Critical IoT & Smart Infrastructure
In industrial settings, a “Hard-fail” policy on a network of 10,000 sensors could lead to a complete factory shutdown if a single OCSP responder goes offline. Here, Soft-fail paired with aggressive monitoring is often the pragmatic choice to maintain operational uptime.
- Over-the-air PQC updates
- Ultra-low footprint kernels
02. 5G Core & Telco Grids
High-bandwidth, low-latency 5G slices require instant verification. Hard-fail is often mandated here for inter-carrier trust, but it requires distributed, high-availability revocation caches to prevent massive service outages.
Key Insight
“In 5G, the revocation check is as much about protecting the network from compromised nodes as it is about subscriber privacy.”
03. Autonomous Agentic AI
When AI agents act as proxies for human users, identity must be absolute. We recommend Hard-fail for agent-to-agent interactions. An agent without a verifiable identity should be considered a rogue actor until proven otherwise.
The Imperative for Immediate Action
Harvest Now, Decrypt Later
Adversaries are currently intercepting and storing encrypted data with the intent of decrypting it once large-scale quantum computers become viable. For IoT and long-lifecycle industrial assets, the data being transmitted today must be protected against future quantum decryption capabilities.
Transition at Scale
Migrating millions of M2M identities isn’t an overnight task. It requires a robust, agile infrastructure that can handle hybrid states. The time to build the “crypto-agile bridge” is years before the RSA/ECC break-point, ensuring seamless rotation across entire fleets.
Post-Quantum Cryptography (PQC)
Cryptographic algorithms designed to be secure against a cryptanalytic attack by a quantum computer. TrustFactory leverages NIST-selected finalists like ML-KEM and ML-DSA to ensure identities remain immutable in the post-quantum era.
Crypto Agility
The ability of a system to rapidly switch between cryptographic primitives (algorithms, key lengths) without significant infrastructure overhaul. It’s about building for change, not just for one standard.
Vertical Resilience
How PQC and Agility manifest in mission-critical environments.
Industrial IoT
Securing sensors with 20-year lifespans that will inevitably face the quantum threat during their deployment.
- Over-the-air PQC updates
- Ultra-low footprint kernels
5G Infrastructure
Protecting massive machine-type communications (mMTC) and network slicing logic against quantum eavesdropping.
- Zero-latency key exchanges
- Network-slice specific roots
Agentic AI
Granting autonomous agents the ability to verify each other’s intent and identity with quantum-secure proofs.
- Dynamic permissioning
- Non-repudiation for AI actions
The Sane Rule of Thumb
High Security
Financial transactions, Healthcare records, Admin access.
Use Hard-fail.
High Availability
Consumer web apps, Public WiFi, Smart Home IoT.
Use Soft-fail + Short TTLs.
Hybrid Best Practice
The “OCSP Stapling” approach. Push the status to the client instead of making them pull it.
Use Soft-fail + Short TTLs.
The Architect’s Conclusion
Revocation checking is not a binary switch, but a spectrum of risk management. By choosing Hard-fail, you accept that your infrastructure’s availability is now tied to your CA’s responder health. By choosing Soft-fail, you accept a security “grace period” that an attacker could exploit.
Modern architectures should strive for Must-Staple certificates. This forces the server to provide a fresh OCSP response during the TLS handshake, effectively achieving the security of Hard-fail with the performance and availability benefits of Soft-fail.
At Cumulocrypt, we advocate for transparency: define your failure policy in your security manifest, and never let a timeout be the reason for a breach.
Industrial IoT
Principal Security Architect, Cumulocrypt