The Dangers of SSL Certificates
Source: Hacker News
Bazel SSL Certificate Expiration Incident
Yesterday, the Bazel team at Google did not have a very Merry Boxing Day. An SSL certificate expired for and , as shown in this screenshot from the GitHub issue.
Impact on Users
The expired certificate broke the build workflow of users who use Bazel, who were faced with the following error message:
ERROR: Error computing the main repository mapping: Error accessing registry https://bcr.bazel.build/: Failed to fetch registry file https://bcr.bazel.build/modules/platforms/0.0.7/MODULE.bazel: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
Incident Summary
After mitigation, Xùdōng Yáng provided a brief summary of the incident on the GitHub ticket:
it was an auto‑renewal being bricked due to some new subdomain additions, and the renewal failures didn’t send notifications for whatever reason.
Why SSL Certificates Are Dangerous
Say the words “expired SSL certificate” to any senior software engineer and watch the expression on their face. Everybody in this industry has been bitten by expired certs, including people who work at organizations that use automated certificate renewal. In fact, this case is an example of an automated certificate renewal system that failed!
- SSL certificates are a fundamentally dangerous technology.
- Operational experience is limited until something goes wrong.
- When a failure occurs, teams often start from scratch to diagnose and fix it.
- In this incident, Bazel team members who were very unfamiliar with this whole area had to scramble to read documentation and secure permissions.
Even if the team has local SSL certificate expertise, those members were out of the office because of the holiday. With an automated “set‑it‑and‑forget‑it” solution, knowledge does not spread across the team because it just works—until it stops working.
Failure Mode Characteristics
- The failure mode is the opposite of graceful degradation.
- One minute everything works fine; the next minute every HTTP request fails.
- There is no natural signal to operators that the SSL certificate is approaching expiry.
- No staging of the change is possible because the change is time‑based and affects all users simultaneously.
In other words, SSL certs have an expected failure mode (expiration) that maximizes blast radius—a hard failure for 100 % of users—without any natural feedback to operators that the system is at imminent risk of critical failure. With automated cert renewal, the likelihood increases that responders will not have experience with renewing certificates.
Is it any wonder that these keep biting us?

