I recently discussed service support / operations on twitter with someone frustrated at being asked to provide support for a public cloud service he had very little ability to. Not technical capability, because he seemed capable enough, but access to the systems to do more fundamental fixes. We didn’t get far into the details, but it seemed he was being asked to deal with something which hadn’t been thought about when the service was provisioned.
This is sadly not an uncommon situation among support teams who often have no say in the choice of system/service, no time to get familiar with it, and no means of pushing back on critical gaps even if they do get a chance to see it beforehand. These are often gaps that a 5 year old could often spot, let alone experienced engineers but which seem to be frequently ignored by people whose shopping list only has one thing on it: functionality.
The non functionals, or what ITIL refers to as ‘service warranty’ (is it fit for use in your organisation?) are things such as:
- how well it integrates with your monitoring and alerting systems
which don’t always get a look in – especially in less well developed organisations, or those without significant service design expertise – and even with these, you might be lucky to get a basic performance/load test before go-live and maybe a DR drill 6 months after.
This is usually because of impatient or well meaning attempts to get certain teams/departments up and running rather than Machiavellian plotting, but the end client will still suffer for this in the long run. Once the shiny veneer flakes off that chosen service, you’ve ended up with your data (possibly your customers) or at a minimum some business processes stuck in an unreliable, insecure system in a bespoke format which makes it difficult to get them out.
Not only this, but your support costs over time will keep going up and up. You’ll probably have to hire specialists to support the specialist functionality. There’ll be single point of failure bottlenecks which means it’ll go down at the drop of a hat and stay down because you didn’t check them out to see if their magical services were reliant on fairy-breath-unicorn-tear fuel which is sold out now sorry.
This design isn’t the only thing that fails to happen. This whole process, facilitated by Service Transition, should have an acceptance phase to make sure it works before everyone in your company starts using it. This is called Operational Acceptance Testing or OAT (often also called Service Acceptance). It usually includes:
- Can I monitor it?
- Can I perform backups & restores / snapshots / flexclones on the data and system?
- Will it generate the right events/alerts at the right trigger points?
- What happens when I yank the cable out the loadbalancer / unplug the ESX server it sits on – does it come back up gracefully?
- What kind of calls will be logged for this by users and how will we handle them?
- Who at the vendor side will pick up these support tickets?
- What’s their SLA – if I put through a cheeky Sev-1 incident does it get insta-triaged correctly or does it sit there for 3 days?
- How do we decommission it?
- Who do they buy their services from? Is their datacentre powered by lava from an active volcano?
The value in this process is twofold:
- It gives you a fighting chance at providing your clients services that are fit for use as well as purpose.
- The people supporting it will have a head start if you involve them in acceptance testing in the first place, and will tell you early and loudly if anything is unsupportable.
In the early days of ITIL, we focused heavily on the processes for doing incident, problem, change etc. But as we move into a more and more distributed supplier/vendor matrix of more and more services, IT organisations really have to start looking at the commercials and getting the overall service design right. These cloud apps might look pretty and do some very clever things (and don’t get me wrong; good UX/UI and killer functionality can be the difference between success and failure) but if you don’t design in the non-functionals right from the start, and if you don’t test these with the people who’ll be going into battle with them day in day out, you’re going to end up with something nobody wants: an unsupportable service which takes your money and gives you nothing but problems.