On November 18th, 2025, at approximately 11:20 UTC, Cloudflare experienced a corrupted Bot Management feature file that caused a cascading failure across their global network. This resulted in reduced webhook ingestion to our platform, with approximately 65-70% of normal volume processed (a 30-35% reduction) for approximately 3 hours and 10 minutes between 11:20 UTC and 14:30 UTC.
The impact to our ingestion was limited by two factors: (1) permissive Bot Management settings on our account, which allowed most traffic through despite the bot scoring system malfunction, and (2) Workers KV implementation configured to fail open, meaning traffic continued to be processed when KV lookups failed, though some customers saw "unknown source" messages.
Full ingestion volumes resumed at 14:42 UTC following Cloudflare's recovery.
All webhooks that reached our ingestion endpoints were processed and delivered.
Timeline
- 11:20 UTC: Cloudflare experienced a corrupted Bot Management feature file deployment, beginning the incident
- 11:30 UTC: Initial spike in error reports from our systems; traffic began dropping significantly
- 11:48 UTC: Cloudflare publicly acknowledged the issue: "Some services may be intermittently impacted"
- 11:48 UTC - 13:13 UTC: Our ingestion experienced degradation with inability to connect and TLS errors
- ~12:00 UTC: Peak impact period; traffic at lowest levels (visible in monitoring graphs as significant valley)
- 13:05 UTC: Cloudflare implemented Workers KV and Access bypass, reducing impact
- 13:13 UTC: Cloudflare announced the issue had been identified and a fix was being implemented
- 14:24 UTC: Cloudflare stopped creation and propagation of new Bot Management configuration files
- 14:30 UTC: Cloudflare deployed corrected Bot Management configuration file globally; main impact resolved
- 14:42 UTC: Cloudflare declared incident fully resolved; ingestion volumes began returning to normal
- 14:42 UTC - 15:30 UTC: Our ingestion traffic recovered to normal levels
- 17:06 UTC: All Cloudflare services fully restored
Impact
During the outage window, our platform experienced reduced webhook ingestion due to Cloudflare's Bot Management system malfunction and Workers KV elevated error rates. Specifically:
- Webhook ingestion was reduced by approximately 30-35% (maintaining 65-70% of normal volume) for ~3 hours. The graph below shows the drop in traffic that we experienced, consistent with Cloudflare’s explanation that some parts of the network were operating properly depending on what config file they received (good or bad)