Currents - Notice history

100% - uptime

API - HTTP REST API - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 99.59%
Jan 2025
Feb 2025
Mar 2025

API - Dashboard Browsing - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Ingest and Orchestration - Operational

100% - uptime
Jan 2025 · 99.72%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Data Pipeline - Operational

100% - uptime
Jan 2025 · 99.72%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Scheduler - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

3rd Party Integrations - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Cypress Integration - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Playwright Integration - Operational

100% - uptime
Jan 2025 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2025
Feb 2025
Mar 2025

Notice history

Feb 2025

No notices reported this month

Jan 2025

Slowdowns and high load
  • Resolved
    Resolved

    This incident has been resolved.

    The root cause was identified as a bug in telemetry configuration for background tasks queue.

    Adding telemetry data broke tasks deduplication, causing DB overload due to redundant write requests.

  • Monitoring
    Monitoring

    The system is now once again working as normal. During the incident you may have failed to create runs, and run data from the incident window may be missing in the dashboard, leading to timeout out runs and missing results.

    New runs should now be working as expected as of Sat 11:30pm UTC

    We are still closely monitoring for further errors. And we will be doing a deeper investigation into the root cause on Monday.

    Several steps were taken to increase our database capacity during the incident, and canceling long running task. As well as a rollback of of some of our most recent deployed services.

    It's not yet clear whether the rollback or the cancelling of long running tasks resolved the incident. We will be looking into the root cause deeper during regular business hours.

  • Identified
    Identified

    We have identified a bottleneck in DB writes that caused slow processing and accumulation of tasks in processing queues. Scaling up the DB cluster and terminating long-running operations restored the system stability.

    Investigating the root case for the increase in DB resources consumption.

  • Investigating
    Investigating

    We are currently investigating this incident.

    We are seeing an elevated number of errors, as well as load on our database. Run processing has been significantly delayed. We are investigating.

Jan 2025 to Mar 2025

Next