Dave Banks on Onix for 24/7 Production Stability and Enhanced Observability

Posted by

The lights never truly went out at Dave, the pioneering FinTech company. Their commitment to offering consumer-friendly financial services demanded a system that was rock-solid, available 24 hours a day, 7 days a week. Yet, maintaining this perpetual state of readiness required more than internal effort; it needed a dedicated, vigilant partner. Onix became the managed services team that was Dave’s sentinel in the cloud.

The Mission: First Line of Defense

Onix’s objective was to stand as the first and most critical line of defense. Their Standard Production Support was designed to ensure the continuous stability of every business-critical system—from core infrastructure to security and observability platforms.

The Onix team took charge of the “Run” support, handling the relentless stream of daily operations and alerts:

  • Vigilance: They constantly monitored cloud workload metrics, tracking availability, performance, and capacity across Google Cloud Platform (GCP) services like Compute, GKE, BigQuery, and Dataproc. When an automated task failed or a system alert fired, Onix was the first to validate it and initiate rapid triage.
  • Containment: Using established ITSM processes, they managed incidents with rapid response, clear communication, and basic resolution. They performed routine health checks on everything—servers, networks, databases, and backup jobs—ensuring minor issues never grew into major crises.
  • Security Watch: Beyond performance, Onix kept a sharp eye on the security dashboards, running standard patching workflows and immediately escalating any suspicious or critical security event to Dave’s internal teams.
  • The Help Desk: They handled essential user needs, from password resets and MFA lockouts to responding to in-scope chat channel queries, freeing up Dave’s engineers from low-level distractions.

This proactive approach allowed Dave’s seasoned technical staff to shed the burden of first-line support and dedicate their focus to long-term improvements and core product innovation.

The Transformation: Engineering Clarity and Resilience

Onix didn’t just manage the systems; they engineered solutions that made them inherently better, focusing on deep visibility using Datadog and strengthening the platform’s core resilience.

1. GKE: The Clarity Compass

The heart of Dave’s platform, the GKE clusters, often felt like a black box. Onix changed that by deploying a custom Datadog-based GKE Health Dashboard. This solution instantly centralized monitoring, offering Dave’s team an unparalleled view of:

  • Node and pod health.
  • Real-time CPU consumption.
  • Namespace-level issues and workload behavior.

This “Clarity Compass” enabled SREs to spot unhealthy components faster than ever before, dramatically speeding up troubleshooting and onboarding for new team members.

2. Cloud SQL: Unmasking the Database Ghosts

Database performance can be notoriously elusive. Onix tackled this with a suite of Datadog dashboards for Cloud SQL that brought key issues out of the shadows:

  • One dashboard highlighted slow queries, offering filters and direct links to GCP logs for immediate, deep analysis.
  • Another was designed specifically to detect deadlocks—the invisible processes that could halt application flow—allowing the team to address them proactively.
  • By ensuring logs flowed seamlessly into Datadog across all environments, Onix delivered a unified, stable view that significantly improved overall database stability.

3. Fortifying the Foundation and Finding Hidden Value

With stability and visibility achieved, Onix moved to safeguard the platform’s foundation and optimize resources:

  • Resilience via Automation: The team established a standardized snapshot schedule for all standalone VMs. Crucially, they implemented this unified policy using Terraform, ensuring the automated backup coverage was consistent, easily managed, and robust. This action closed backup gaps and strengthened Dave’s disaster recovery posture.
  • Cost Discovery: Through meticulous review, Onix identified numerous idle servers in shutdown states that were silently incurring charges. By confirming and decommissioning these unnecessary assets, Onix delivered immediate and tangible cost savings back to Dave, ensuring their cloud environment was not only stable but also cost-efficient.

In the end, the partnership empowered Dave to maintain its aggressive growth and market position, knowing their most critical systems were protected by a dedicated sentinel and supported by a continuous loop of technical improvement.

Related customer stories

Subscribe to to stay in the know

Your trusted guide to everything cloud

No matter where you are on your journey, trusted Onix expert scan support you every step of the way.