On Wednesday, the New York Stock Exchange was down for nearly four hours. As soon as trading was halted, speculation began to fly that the outage was the result of the exchange being hacked.
Reality turned out to be a little less interesting. NYSE realized that a botched software update was causing major glitches across its trading systems. Although this was a very high profile outage, it is commendable that NYSE’s IT staff was able to recognize the problem and roll the change back. This is a great example for how IT Change Management should be applied.
Not Every Outage Involves Hackers
With all the attention on cyber security, it’s easy to forget that human error and a lack of good IT governance are far more likely to cause an outage than malicious actors are.
Shooting yourself in the foot is a lot more embarrassing than getting hacked – especially since it can be avoided.
According to the Visible Ops Handbook from the IT Process Institute, "80% of unplanned outages are due to ill-planned changes made by administrators ("operations staff") or developers." ITPI dives further into these self-inflicted & unplanned outages noting that the majority of the time to restore services is spent figuring out exactly what changed because of a lack of effective Change Management.
Change Management Isn’t a Bad Thing
Many IT professionals have a very negative view of Change Management and ITSM frameworks like ITIL. They see them as administrative and bureaucratic burdens that prevent “real work” from being done.
Those true believers that feel like you have to implement every piece of the gospel according to ITIL aren’t helping the cause either. It is unrealistic to go from an undisciplined environment to having every ITIL process fully realized overnight.
Always remember that the Change Management process is there to reduce risk and ensure changes are well thought out. It can be as simple as making everyone agree to write down and discuss their changes and preventing unauthorized changes.
IT “Cowboys” Are Symptoms of a Bigger Problem
Small IT shops without mature IT processes often have one key staffer that keeps all the lights on. They eschew documentation and fix things based on their gut feelings. They’ve always got a magic bullet ready to restore services when the worst case scenario happens.
“Cowboys” in IT have had a good run but it is past time to send them packing. Not only do they often cause the very outages they’re fixing through human error, they tend to keep knowledge to themselves which prevents new staff from learning your systems and grinds troubleshooting to a halt when they’re unavailable.
It is an unacceptable risk to let critical production systems be run by cowboys who make changes outside of the Change Management process. The presence of cowboys is a symptom of poor IT governance where the organization is operating without a plan.
Write it Down!
Documentation is one area where many IT shops struggle. They don’t write down policies and procedures. They don’t keep their configuration information readily available and up to date. They find themselves flailing about when an outage happens because they don’t have any reference materials handy....Read More