<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1734298&amp;fmt=gif">
Artboard 1 copy 5@2x

Is Your DevOps Team Taking Too Many Risks?

Posted by Tomasz Kruszewski

How to stop DevOps from Damaging Revenue

DevOps is a necessity. If you want to keep up with modern software development, there are no two ways around it. The experiences of market leaders make it clear.

If you want to build the on-demand enterprise, respond to customer needs and keep up with the speed of innovation, you need DevOps.  

The challenge is how to ensure risk is managed while teams transform into this agile way of working. Culture change is never easy and poorly implemented DevOps could have bottom-line consequences.

With DevOps, you always need to make sure that you have the tools and culture needed to mitigate risk and do DevOps as it was intended.

So, what’s the problem?

Developers are expected to deliver speed to market; Operations are expected to ensure security, reliability, availability, and governance. If every team is not pursuing the same goals, there is a conflict of priorities and resources. If process is bypassed in pursuit of speed, failures start to impact customers and ultimately revenue.

When DevOps started as a grassroots initiative there was typically little attention paid to the question “how does this scale?” It was easier to show early success in a small isolated team and for an initial project.  Now that DevOps is becoming more mainstream, it is critical to have robust processes in place and those processes need to be observable and able to be fixed fast.

Speed, cost or quality?

There is no inherent risk in DevOps in itself. In fact, DevOps done well provides both speed and quality in delivery. It’s how you do it well that makes the difference. So, let’s look at what makes up an ideal framework to implement DevOps.

DevOps has four primary focus areas:

  • Collaboration between project team roles
  • Infrastructure as Code: Scripted Infrastructure Configuration
  • Automation of Tasks / Processes / Workflows
  • Monitoring Applications and Infrastructure

Tying these focus areas together minimises risk — working on one in isolation increases risk. DevOps isn't something to be simply implemented without a second thought, it needs the proper tools and the buy-in of the organisation.

Fortunately, there are tools and best practice techniques to help make this happen. One of the early pioneers and still an excellent source of best practice guidelines is the Phoenix Project and its three way-thinking. 

From system thinking to continuous improvement

First Way: System Thinking

The First Way emphasises that you must look at the performance of the entire system, as opposed to a specific silo of work or department. The focus is on all business value streams, reducing the time needed from commit to production — the “lead time”. Managing the size of releases is the simplest way to drop lead times. Smaller, more frequent software releases reduce waste and improve time-to-market.   

Second Way: Amplify Feedback Loops

The Second Way is about creating shorter and faster feedback loops. The goal of almost any process improvement whether DevOps or not is to shorten and amplify feedback loops so corrections can be made continually.

Development, Test and Operations must feel to be part of this feedback loop system — not in a bucket-brigade throwing problems over the fence to each other. The more time an operations team spends independently probing problem spots, the less time the system (and increasingly, the business) is up and running. As a result, the group can take a quick and easy fix for the problem rather than digging down to the more complex root-cause.

Troubleshooting can also become an item passed from one group to the next. The goal becomes to avoid blame and spend as little time as possible examining what led to the issue.

The solution? Allow people in these departments (and they are usually separate departments) to get together frequently to jointly analyse issues.

Third way: Continuous Improvement

The Third way is about creating a culture that fosters two things: continual experimentation and repetition and practice to achieving continual improvement.

This is where — if your developers jump to the Third way without the other two — your revenue and your customer reputation are potentially at risk. You do not want to be experimenting on customers and you do not want developers taking risks without the involvement of the rest of the business.

If you have mastered the first two ways, you can afford to experiment and take risks. You can even inject faults, find out where it breaks and then fix the system to increase resilience. Aside from fault injection testing, you could also check your customer expectations: for example, are your users using particular features? But what tools could you use to see this?

If you can’t measure it, you can't improve it

To support such high velocity, key lightweight releases, the right tools must be built in the DevOps strategy. Up till now, a huge gap existed in the DevOps toolchain — the provision of observability by dynamic analysis of code in production. It was the missing link in the toolchain — the agent to underpin the 3 Ways of the Phoenix Project.

DevOps depends on feedback. Feedback directs agile responsiveness in a way that is actually useful. Moreover, it’s impossible to prevent the release of bad code. The high-velocity code releases of DevOps heighten this risk. DevOps code releases are small, but their numbers make tracking them hard.

The problem with microservices and serverless

Architecture built to support DevOps (microservices and serverless) actually make this problem of observability worse. They create webs of dependencies, the relationships between which need to be tracked.

The traditional solution is to use APM tools or logs. However, both fail in different ways. APMs can only tell you surface level information. With logs, there are too many sources of information to keep track of. If you know what to log, you already know the problem.

When an error occurs, user experiences are almost always impacted. The best case scenario is a long testing process in which attempts are made to reproduce failures (often requiring subsequent tests in production), and superficial solutions are implemented in the place of root-cause analysis. This can cause tech debt and problems down the road.

True observability platforms

The way to remove these risks and deliver true observability over microservices, serverless and your entire DevOps toolchain is to harness a new-generation of observability tools.  

Software flight recorders (record & replay) technology has staked out space in test as context driven, dynamic code analysis tools. However, impacts on performance have rules out their use in production, and limited their test capabilities — until now.

Businesses like RevDeBug, have allowed the performance impact to be neglible, falling below 10% and even nearing zero when configured properly.

Bringing this capability into production has entirely changed the game in terms of monitoring. The granular and context-driven detail of software flight recorder technology pierces through the complexity of microservices and delivers end-to-end observability across your entire DevOps toolchain.  

This observability is partnered with and informs global heat maps of production environments, along with automated rollbacks within milliseconds of a fault. Changes then can be made rapidly, informed by root-cause analysis and a 100% reproducer of the fault. This creates a self-healing DevOps pipeline that is freed to take full advantage of the possibilities provided by microservices without the risk.  

Observability, in a nutshell, asks questions from the outside while being able to delve into a system to understand the root cause and fix it. Without observability, you can’t even do the Ops part of DevOps. This is the key ingredient that gives us the ability to change a system after it has been deployed to production. Refined to remove impacts on performances, True Observability Platforms deliver this capability.

To the new frontier: risk-free DevOps delivered by True Observability

DevOps represents a new way of thinking about technology and how it supports the company’s mission. It is both powerful and effective: It can increase innovation, help companies grow, mitigate risk, improve compliance, get products to market quickly…what else could you ask for?

Just, don’t replace your infrastructure silos with a DevOps silo. If you’ve already siloed your DevOps team, you’re not using system thinking and you are opening your business to risk. And if that perception of risk grows when building a continuous delivery strategy, it will be difficult to persuade your organisation to embrace failure if you cannot demonstrate a robust process for limiting the negative effects of those failures.

To remove risk from DevOps and accelerate your innovation at the same time, you need observability — you need true observability platforms that bring dynamic code analysis into production. Automated rollbacks will protect users and  100% reproducer of faults gives your teams the information they need to then make rapid, root-cause fixes.

DevOps is moving towards observability-driven deployment. Shipping small, measurable, observable releases to production and observing them running with real users, real data, real traffic patterns, and real services.  There is really no substitute.