Incident Overview
On May 7, 2026, an infrastructure event in AWS US-EAST-1 (availability zone use1-az4) caused degraded performance in Amazon DynamoDB, a component of Chargebee’s infrastructure. Rather than a sustained outage, this manifested as two brief windows of elevated API errors — approximately 3:00–3:15 AM UTC and 4:15–4:50 AM UTC on May 8, 2026, while the rest of the period remained largely stable. Approximately 1.5% of active sites experienced intermittent API failures or background job delays during these windows, with actual impact limited to approximately 15 minutes across the event. All affected background jobs were successfully rescheduled and completed.
What Happened
AWS experienced a thermal event in a single availability zone (use1-az4) in US-EAST-1, which impaired a subset of DynamoDB storage partitions. While the AWS-side infrastructure recovery took several hours, the customer-visible impact on Chargebee was limited to intermittent error spikes rather than continuous degradation, reflecting our platform’s ability to absorb partial infrastructure failures.
What We Are Doing
All API errors during this incident were traced to DynamoDB latency caused by the AWS internal AZ-level impairment. Given DynamoDB’s role in our business-critical workflows, we are evaluating highly available alternatives and fallback architectures to reduce the impact of similar infrastructure events.
We recognise that even intermittent disruption has real consequences for your operations. The errors that spiked during this incident were brief, and our systems absorbed the majority of the AWS recovery window. We remain committed to further reducing Chargebee’s exposure to external infrastructure events