Intermittent API Request Timeouts - US Region

Incident Report for Chargebee

Postmortem

Intermittent API Errors and Background Processing Delays

Incident Overview

On May 7, 2026, an infrastructure event in AWS US-EAST-1 (availability zone use1-az4) caused degraded performance in Amazon DynamoDB, a component of Chargebee’s infrastructure. Rather than a sustained outage, this manifested as two brief windows of elevated API errors — approximately 3:00–3:15 AM UTC and 4:15–4:50 AM UTC on May 8, 2026, while the rest of the period remained largely stable. Approximately 1.5% of active sites experienced intermittent API failures or background job delays during these windows, with actual impact limited to approximately 15 minutes across the event. All affected background jobs were successfully rescheduled and completed.

What Happened

AWS experienced a thermal event in a single availability zone (use1-az4) in US-EAST-1, which impaired a subset of DynamoDB storage partitions. While the AWS-side infrastructure recovery took several hours, the customer-visible impact on Chargebee was limited to intermittent error spikes rather than continuous degradation, reflecting our platform’s ability to absorb partial infrastructure failures.

What We Are Doing

All API errors during this incident were traced to DynamoDB latency caused by the AWS internal AZ-level impairment. Given DynamoDB’s role in our business-critical workflows, we are evaluating highly available alternatives and fallback architectures to reduce the impact of similar infrastructure events.

We recognise that even intermittent disruption has real consequences for your operations. The errors that spiked during this incident were brief, and our systems absorbed the majority of the AWS recovery window. We remain committed to further reducing Chargebee’s exposure to external infrastructure events

Posted May 15, 2026 - 17:12 UTC

Resolved

We have verified that the intermittent API latency has been fully resolved. Our monitoring confirms all systems have fully recovered.

Posted May 08, 2026 - 07:36 UTC

Update

We are continuing to monitor the services. The incident remains open as we verify full system recovery. We will share an update as we progress.

Posted May 08, 2026 - 04:08 UTC

Update

Our team is working with AWS to assess the impact. We will share an update as we progress.

Posted May 08, 2026 - 00:57 UTC

Monitoring

Between May 7, 23:35 UTC and May 8, 00:55 UTC, we experienced intermittent API request timeouts in the US region due to an AWS platform issue.

We are monitoring the systems to ensure continued stability.

Posted May 08, 2026 - 00:29 UTC

This incident affected: API Endpoints (API (US)).