Currents - Analytics and metrics partial outage – Incident details

Analytics and metrics partial outage

Resolved
Degraded performance
Started about 1 year agoLasted about 9 hours

Affected

API

Degraded performance from 8:42 PM to 6:05 AM

API - HTTP REST API

Degraded performance from 8:42 PM to 6:05 AM

API - Dashboard Browsing

Degraded performance from 8:42 PM to 6:05 AM

Updates
  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Monitoring

    The cluster rebalancing is complete, enabling all the ES queries, monitoring the performance and resolution.

  • Identified
    Identified

    The root cause was identified as a failed allocator for the ElasticSearch cluster.

    A failure in one of the ElasticSearch nodes caused and increased CPU and load for the whole cluster, we are relocating the data to a different node and rebalancing the shards between the new nodes.

    We turned off some search queries to temporarily reduce the cluster load and reduce the recovery time.

  • Investigating
    Investigating

    We are dealing with a partial outage with our analytics systems. Insights and performance metrics are temporarily unavailable.