An increase in errors and delays from our services

Resolved

Degraded performance

Started over 1 year agoDecember 19, 2024Lasted about 3 hoursDecember 19, 20245:018:02 PMUTC

Affected

API

API - HTTP REST API

API - Dashboard Browsing

Data Ingestion

Updates

Resolved
December 19, 2024 at 8:02 PMUTC
Resolved
December 19, 2024 at 8:02 PMUTC
The system has recovered. We will continue to investigate the root cause and will update the description of the incident with the details when we have them.
Monitoring
December 19, 2024 at 7:05 PMUTC
Monitoring
December 19, 2024 at 7:05 PMUTC
Our backlog has recovered, and the service is back to normal.
The root cause appears to have been changes in our now reverted deployment. We will continue to investigate the cause and provide more details after our investigation.
Identified
December 19, 2024 at 6:23 PMUTC
Identified
December 19, 2024 at 6:23 PMUTC
We are still seeing issues as a result of the backlog, we are scaling up to address the backlog of tasks.
We are continuing to investigate the cause in the meantime.
Update
December 19, 2024 at 6:01 PMUTC
Update
December 19, 2024 at 6:01 PMUTC
We have reverted the most recent release, and the error rates have gone down.
We are still experiencing some slowdowns due to the task backlog that grew during the incident, but the system is mostly recovered.
Runs that were created during the outage will be marked as timed out, and you may see errors related to those runs in your local clients if they are still in progress.
New runs should not be affected, and should now work as expected.
We are still actively investigating the root cause.
Investigating
December 19, 2024 at 5:01 PMUTC
Investigating
December 19, 2024 at 5:01 PMUTC
We are currently investigating this incident.

Currents - An increase in errors and delays from our services – Incident details

All systems operational

An increase in errors and delays from our services