A Checklist for Process Improvement: Metrics that Matter
A recent study shows what makes the difference between high-performing organizations and those that struggle.
by Gene Kim
To achieve what most IT organizations constantly struggle with, high-performing IT organizations have figured out which IT processes and controls help them be effective and efficient. They have integrated those processes and controls into how they manage almost every aspect of their daily work, helping them achieve their business goals. These processes and controls also help them find variance before it causes a catastrophic outage, compromises security, or impacts their customers.
Every IT organization strives for efficiency, effectiveness, and high performance. But we know that it is not sufficient to announce to your organization that from now on, best practices will be adopted and followed. Announcements themselves do not deliver results. Worse, best-practice standards such as ITIL and COBIT are exhaustive and do not give definitive guidance about where to start and which portions will have the highest rate of return for your organization.
Wouldn’t it be great if you knew which metrics actually mattered to the performance of your IT organization? In the recently published IT Process Institute (ITPI) IT Controls Performance Study, researchers discovered those organizations with the highest performance—as determined by results in several key areas—had common traits that significantly impact system availability, compliance, risk control, and operational effectiveness and efficiency. The study, conducted in cooperation with Carnegie Mellon University, Florida State University, and the University of Oregon, found that change control processes, and adherence to those processes, was what made the difference between high-performing organizations and those that struggle.
The metrics that matter include:
Metric #1: Amount of Time Devoted to Unplanned Work
Unplanned work is the silent killer of IT organizations. Low-performing organizations spend more than 50 percent of their time on unplanned work activities, putting them in near constant “firefighting” mode, pulling IT staff away from activities that support business goals. Put another way, less than 50 percent of their time is available to complete planned work (that IT has already committed to). Worse, usually the business is depending on the planned work being completed on time, as they are for the fulfillment of business goals, such as completion of capital projects, and compliance.
When IT spends too much time on unplanned work, planned work does not get completed, which affects IT and the business alike. Reducing unplanned work starts with change control and eliminating unauthorized change. High-performing organizations spend less than 5 percent of their time on unplanned work and practice zero tolerance for unauthorized change.
Metric #2: Percentage of Emergency Changes
Many IT organizations must face a high number of emergency changes. This is one of the best indicators that the change-management process is broken. The intended control can be circumvented merely by flagging the change request as “urgent” or “emergency.” By definition, these emergency changes are the highest risk changes. Therefore, they are the most critical to scrutinize and require the most deliberation to approve.
All changes that create risks must be evaluated and authorized, especially during emergencies. High performers tend to classify less than 5 percent of changes as emergency. Rates higher than 10 percent are often a warning sign to auditors that change controls are being circumvented.
Metric #3: Percentage of Failed Changes and Change Rate
Behind virtually every outage is a failed change, and therefore, a change control failure. Change inherently has some degree of risk and always has a probability of unforeseen consequences, which is why changes must be reviewed and authorized before they are put into production. An environment that allows unplanned, unauthorized, and uncontrolled change can count on frequent change failures, resulting in major portions of time being spent on tracing the problem. This also results in poor IT service quality, frequent outages, and long repair times.
Furthermore, this retards the IT organization’s ability to make changes. High performers make five to 14 times more IT changes than medium and lower performers, while often sustaining change success rates of more than 95 percent, ensuring higher system availability.
Metric #4: Mean Time to Repair
The most obvious measure of a high-performing organization is the availability and stability of IT services (i.e., everything is up and running). As many as 80 percent of outages are caused by mistakes, most of which are the result of unplanned changes deployed on systems that are unstable due to human error and a culture of making unplanned changes. Furthermore, when systems are down, 80 percent of the mean time to repair (MTTR) is dominated by trying to characterize the outage and determine causal factors. Only 20 percent of recovery time is spent repairing infrastructure. When you improve MTTR, you improve system availability. High performers have the lowest MTTR.
When IT organizations perform poorly on these metrics, one of the inevitable outcomes is a lower quality of life for the IT employees. This can have a great impact on the ability of an organization to attract and keep a trained staff. When unplanned work eats up resources and time, the organization begins a downward trajectory. Projects don’t get completed, service is slowed, functional delivery is delayed, compliance problems occur, more outages occur and the availability of the network is reduced.
In the view of the company, this is seen as poor organizational performance, resulting in lower morale and higher turnover. If turnover rates climb, management must constantly train new people to work in what still remains a poorly performing organization. Conversely, when unplanned work is low, staff is given the opportunity to work on strategic business projects. This creates a virtuous cycle where the company becomes a “best places to work” organization and easily recruits and retains its staff.
Adopting Change Control is Key to Improving Performance
Further enforcing the research of the ITPI, the survey found that high-performing organizations share two additional traits: a culture of change management and a culture of causality. They also practice two discriminant controls that are absent in lesser performers:
- They actively monitor systems for unauthorized change
- They have defined consequences for intentional, unauthorized changes
Regardless of how you measure up to high performers, there are steps you can take to improve your processes. Controlling change to manage unplanned work is an amazingly simple and significant way to improve performance and processes, especially when compared to wading through a sea of best practices literature that may not be relevant to your cause. It’s also helpful to know that in any improvement endeavor, the 80/20 rule applies: 20 percent of the set of IT controls result in 80 percent of the realized benefit. Let metrics that matter help guide your success.
Gene Kim is co-founder and chief technology officer of Tripwire, Inc. He is also co-founder of the IT Process Institute and co-author of the Visible Ops Handbook, published in 2003. His work with the Institute of Internal Auditors on the Guidance for Auditing IT General Controls (GAIT) project helps management scope the IT portions of SOX-404. You can reach the author at firstname.lastname@example.org