SSL Certificate Errors for API requests
Incident Report for Chargebee
Postmortem

RCA On SSL Certificate Errors for API requests
Date of Incident: 13-Feb-2024

Incident Summary:

During the routine renewal of our SSL certificate, our team expected DigiCert to provide a new certificate signed by their existing root certificate ("DigiCert Global Root CA"). However, we missed checking the new Root certificate provided by DigiCert ("DigiCert Global G2 Root Certificate") in response to Mozilla's decision to begin distrusting older root certificates. This change was not anticipated by our team, and as a result, we overlooked the potential impact on our merchant who might not have the new root CA in their local certificate authority lists. This oversight led to SSL connection failures, affecting API calls for those clients.

Technical Root Cause Of The Incident:

The root cause of the downtime was the transition to a new root certificate without ensuring its presence in the client's local certificate authority list, leading to SSL connection issues for some clients.

What Was The Business Impact?

The incident affected merchants who might not have the new root CA in their local certificate authority lists, preventing them from establishing SSL connections.

Remediation And Follow-up Steps:

We reverted to the old certificate and are in the process of obtaining a new certificate signed with ("DigiCert Global Root CA"). We will continue to support the old Root CA  ("DigiCert Global Root CA") until our merchants are notified about the change in the Root CA.

Learnings:

  • Certificate changes, particularly those involving root CA transitions, should be thoroughly planned and tested, including client-side dependencies.
  • Monitoring should be enhanced to include alerts for SSL handshake exceptions in the Application Load Balancers.

Next Steps: Finalize and notify all merchants regarding the root certificate change before 2025 and their support to add the latest root certificates to their clients certificate authority list.

Conclusion:

This incident emphasized the importance of careful planning and testing when updating SSL certificates, especially regarding root certificate transitions. Moving forward, we are committed to finalizing the transition smoothly, ensuring all merchants are informed and supported in updating their certificate authority lists. This experience has prompted us to enhance our monitoring systems and processes to prevent similar disruptions, reinforcing the reliability and security of our services. We sincerely apologize for any disruption this may have caused you and appreciate your patience during our investigation.

Posted Feb 21, 2024 - 09:17 UTC

Resolved
We have noticed these errors between 06:17 UTC and 07:05 UTC. Following thorough monitoring, we can confirm that all systems have stabilized and are operating normally. This incident is now resolved, and we do not anticipate any further impact on our business. We apologize for any inconvenience or disruption caused.
Posted Feb 13, 2024 - 07:27 UTC
Monitoring
We traced the failures back to a recent infrastructure change.

During an SSL certificate update, compatibility issues emerged with the provided intermediate certificate, resulting in SSL negotiation errors for a portion of clients. It is important to note that browsers/UI requests were not affected, as they support those Certificate Authorities (CAs).

We have reverted the changes and are working closely with our certificate provider to procure a universally compatible intermediate and root certificate. We apologize for any inconvenience caused.
Posted Feb 13, 2024 - 07:20 UTC
Update
We are continuing to work on a fix for this issue.
Posted Feb 13, 2024 - 07:19 UTC
Identified
Chargebee is investigating an issue that could potentially affect some merchants who may notice SSL Certificate Errors for API requests. We will provide further details as more information becomes available.
Posted Feb 13, 2024 - 06:45 UTC
This incident affected: API Endpoints (API (US), API (EU), API (AU)).