Situation

With a backlog of new software, this Fortune 500 company needed more predictable performance, efficiency, and reliability from its systems and applications.

Leadership wanted to give its new users a better experience.

Leadership also wanted to give its new clients a better experience — prioritizing easier onboarding — and provide unified operations support for four enterprise applications.

Action

We partnered with our client to devise an actionable Site Reliability Engineering (SRE) framework with a phased approach for adoption and maturity. Innovative elements of the framework included:


  • Faster releases of consistent quality by employing DevOps principles in automating infrastructure changes and application releases.

  • Aggregating a single source of truth for new IT Service Management (ITSM) and Configuration Management

  • Development teams mobilized for 24x7 platform support and triage.

  • Proactive failure detection with instrumentation for observability.

  • Automation that allowed scalability, self-healing, and toil reduction.

  • Defining a consistent, repeatable and automation-enabled client onboarding process.

Results

  • A defined approach to onboarding cut down a months-long process to weeks.
  • Minimized downtime and outages.
  • Fully automated production environments adhering to CI/CD.
  • SLA-driven client support and SLO definitions for product onboarding and reliability.
  • More reliable banking software.
  • 90%

    Drop in release cycle duration
  • 99%

    Issue identification and acknowledgement
  • 80%

    Fewer resources needed for client onboarding