Currents - An increase in errors and delays from our services – Incident details

An increase in errors and delays from our services

Resolved
Degraded performance
Started 3 days agoLasted about 3 hours

Affected

API - Reporting and Orchestration

Degraded performance from 5:01 PM to 6:01 PM, Operational from 6:01 PM to 6:23 PM, Degraded performance from 6:23 PM to 7:05 PM, Operational from 7:05 PM to 8:02 PM

API

Degraded performance from 5:01 PM to 7:05 PM, Operational from 7:05 PM to 8:02 PM

API - HTTP REST API

Degraded performance from 5:01 PM to 7:05 PM, Operational from 7:05 PM to 8:02 PM

API - Dashboard Browsing

Degraded performance from 5:01 PM to 7:05 PM, Operational from 7:05 PM to 8:02 PM

Updates
  • Resolved
    Resolved

    The system has recovered. We will continue to investigate the root cause and will update the description of the incident with the details when we have them.

  • Monitoring
    Monitoring

    Our backlog has recovered, and the service is back to normal.

    The root cause appears to have been changes in our now reverted deployment. We will continue to investigate the cause and provide more details after our investigation.

  • Identified
    Identified

    We are still seeing issues as a result of the backlog, we are scaling up to address the backlog of tasks.

    We are continuing to investigate the cause in the meantime.

  • Update
    Update

    We have reverted the most recent release, and the error rates have gone down.

    We are still experiencing some slowdowns due to the task backlog that grew during the incident, but the system is mostly recovered.

    Runs that were created during the outage will be marked as timed out, and you may see errors related to those runs in your local clients if they are still in progress.

    New runs should not be affected, and should now work as expected.

    We are still actively investigating the root cause.

  • Investigating
    Investigating
    We are currently investigating this incident.