Summary

On November 4th, 2025, at 14:15 UTC, Hookdeck experienced a complete HTTP ingestion outage lasting 2 minutes and 11 seconds. Our ingestion endpoints (hkdk.events, events.hookdeck.com, and custom domains) returned a high number of 500 errors during this window, preventing our service from receiving or processing incoming webhooks.

We understand that losing webhooks, even for a brief period, is a serious matter. Any webhook sent to Hookdeck during this window that did not have retry capabilities was not received and cannot be recovered. If your webhook provider offers retries with intervals longer than 2 minutes, those requests would have been successfully processed. We deeply apologize for the impact this had on your operations and take full responsibility for this incident.

The root cause was a misconfigured environment variable in our ingestion workers that was deployed to production. Our monitoring systems immediately detected the issue, and we executed a rollback within approximately 2 minutes, restoring service at 14:17 UTC.

We've had years of operation without downtime on our ingestion infrastructure. We recognize the severity of breaking that track record and are committed to learning from this mistake and implementing additional safeguards to prevent similar incidents.

Timeline

November 4 2025 14:15:09 UTC: Misconfigured deployment went live, ingestion endpoints began returning 500 errors

November 4 2025 14:15:09 UTC: Monitoring detected zero traffic processing

November 4 2025 14:17:22 UTC: Rollback executed, service restored

November 4 2025 14:17:22 UTC: Incident closed

What went well

What went wrong

Remediation and changes