The real problem with frequent emergency changes

Changes are decisions, balancing risk and reward. In the language of Decision Theory, they are classed as normative or prescriptive decisions (i.e. identifying the best decision to take) which rely on access to good information and are best made rationally (rather than irrationally, emotionally or because there’s an expensive marketing campaign about to start that nobody told IT about). The processes and tools used to analyse decisions and their outcomes are usually referred to as Decision Support Systems, and can be technological, procedural, or a combination.

For IT changes, the Decision Support System is the change policy, process, tool and the people who manage, review and approve them. Using these, it’s possible to take one big decision (“should we do this change?”) and carve it into smaller and more manageable decisions, e.g.:

  • Is the Request For Change correctly filled out with meaningful information?
  • Has the technical peer review been completed?
  • Is there a compelling benefit case?
  • Has the change been tested?
  • Is there a fully resourced and tested implementation and remediation plan?
  • Will it clash with another change?

etc etc.

Most of these smaller decisions are human and procedural, though technological quality gates can be included too, such as simulated code builds, static code analysis outputs, configuration data etc.

By splitting bigger decisions into smaller ones, it’s possible to spread the burden of decision-making across several people, although care must be taken to avoid making all of the sub-decisions in isolation (or being overly reliant on theoretical models as per the Ludic Fallacy)  – which is why the ultimate decision (“should we do this change?”) is best taken in the context of the real world situation in your organisation – you and I know this as the CAB.

Most change managers and change approvers know this concept of splitting up big decisions intuitively, which is why we reject poor quality change records, ask for peer reviews, demand to see a test completion report and make people wait for CAB.

The concept of a change process then is a good one. It leads the requestors, assessors and approvers through a logical sequence of smaller decision-making until there is enough information to make that ultimate decision in a relatively straightforward manner with good information to base it on. This process can take days, especially as many of the smaller-decision makers have day jobs, which is why many organisations have weekly scheduled CABs and cutoff dates.

But the benefit of smaller decisions is missing from emergency changes because they simply don’t have time – they’re emergencies. The smaller decisions end up getting rushed (“well, it’s got half an implementation plan, and we tested a bit of it, is that good enough?”) and the burden of the smaller decisions gets pushed to the emergency change approver – which despite good intentions in your emergency change policy can end up being a single senior manager, or sometimes the change manager alone.

If this happens once a month, it’s probably not a big deal. The person who ends up being asked to decide might take a little time to ask a few people their opinions, form an emergency CAB, push back on some shoddy testing, rework the implementation plan, speak to the business and so on. But if it happens several times a day, that person is going to get fatigued, stressed, and ultimately end up making bad decisions.

And this is the problem with having too many emergency changes. Bad or rushed decisions either mean blocking something the business really needs through being too risk averse, or present an unnecessarily high risk to your production services by being too relaxed.