Currents - Notice history

100% - uptime

API - HTTP REST API - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

API - Dashboard Browsing - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

API - Reporting and Orchestration - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 99.72%
Nov 2024
Dec 2024
Jan 2025

Data Pipeline - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 99.72%
Nov 2024
Dec 2024
Jan 2025

Scheduler - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

3rd Party Integrations - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

Cypress Integration - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

Playwright Integration - Operational

100% - uptime
Nov 2024 · 100.0%Dec · 100.0%Jan 2025 · 100.0%
Nov 2024
Dec 2024
Jan 2025

Notice history

Jan 2025

Slowdowns and high load
  • Resolved
    Resolved

    This incident has been resolved.

    The root cause was identified as a bug in telemetry configuration for background tasks queue.

    Adding telemetry data broke tasks deduplication, causing DB overload due to redundant write requests.

  • Monitoring
    Monitoring

    The system is now once again working as normal. During the incident you may have failed to create runs, and run data from the incident window may be missing in the dashboard, leading to timeout out runs and missing results.

    New runs should now be working as expected as of Sat 11:30pm UTC

    We are still closely monitoring for further errors. And we will be doing a deeper investigation into the root cause on Monday.

    Several steps were taken to increase our database capacity during the incident, and canceling long running task. As well as a rollback of of some of our most recent deployed services.

    It's not yet clear whether the rollback or the cancelling of long running tasks resolved the incident. We will be looking into the root cause deeper during regular business hours.

  • Identified
    Identified

    We have identified a bottleneck in DB writes that caused slow processing and accumulation of tasks in processing queues. Scaling up the DB cluster and terminating long-running operations restored the system stability.

    Investigating the root case for the increase in DB resources consumption.

  • Investigating
    Investigating

    We are currently investigating this incident.

    We are seeing an elevated number of errors, as well as load on our database. Run processing has been significantly delayed. We are investigating.

Dec 2024

An increase in errors and delays from our services
  • Resolved
    Resolved

    The system has recovered. We will continue to investigate the root cause and will update the description of the incident with the details when we have them.

  • Monitoring
    Monitoring

    Our backlog has recovered, and the service is back to normal.

    The root cause appears to have been changes in our now reverted deployment. We will continue to investigate the cause and provide more details after our investigation.

  • Identified
    Identified

    We are still seeing issues as a result of the backlog, we are scaling up to address the backlog of tasks.

    We are continuing to investigate the cause in the meantime.

  • Update
    Update

    We have reverted the most recent release, and the error rates have gone down.

    We are still experiencing some slowdowns due to the task backlog that grew during the incident, but the system is mostly recovered.

    Runs that were created during the outage will be marked as timed out, and you may see errors related to those runs in your local clients if they are still in progress.

    New runs should not be affected, and should now work as expected.

    We are still actively investigating the root cause.

  • Investigating
    Investigating
    We are currently investigating this incident.

Nov 2024

No notices reported this month

Nov 2024 to Jan 2025

Next