BUSINESS CRITICAL

December 14, 2021

What you should do before your cloud goes down

When Amazon Web Services recently had a case of network hiccups, we all found out how miserable it is to have all our IT eggs in one cloud basket. We can do better. Here's how.

On Friday, Amazon said in a blog post on it site that "unexpected behavior" triggered the hours-long outage.

"An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network," the company wrote in the post. As a result, devices connected to AWS' network became overloaded.

So, what can you do? Well, for one thing, stop relying on so many Internet of Things (IoT) devices. Your dishwasher, holiday lights, refrigerator, and toothbrush really don't need to depend on the cloud. More seriously though, you can drop any thought of having your IT department go back to running all your own servers. Look at where your business was back when that made sense and where it is now.

During last week's outage, a sysadmin friend of mine had to deal with a company CEO who was having hysterics. The CEO wanted to get back all the company's data—several hundred terabytes worth—and get some application running again right away.

But no matter what the boss wants, you can't always magically make things work—especially when they're outside of your control. Nor, if you sit down with your CFO and go over the numbers, is it likely you can shift all that data and all those applications back to your company. There's a reason you moved to the cloud—usually because it costs less to run things there than locally.

Maybe you really could do it cheaper with your own server room. If so, good for you! But before you pull the trigger, look at what your downtime was like before the cloud. I'd bet you'll find you were actually down and out more often when you ran things yourself.

So, should you "solve" your cloud problem by moving to a multi-cloud setup? That might work, but to do this properly will require at least two public cloud providers and possibly your own data center. That gets really, really expensive.

And if what you really want is a safety net for failures like the AWS one, sorry—multi-clouds simply won't work. As Lydia Leong, Gartner Distinguished VP Analyst, put it: "Multi-cloud failover requires that you maintain full portability between two providers, which is a massive burden on your application developers. The basic compute runtime (whether VMs or containers) is not the problem, so OpenShift, Anthos, or other 'I can move my containers' solutions won't really help you. The problem is all the differentiators—the different network architectures and features, the different storage capabilities, the proprietary PaaS capabilities, the wildly different security capabilities, etc."

Enough of the bad news. Here's what Leong and I think can work for keeping your business up and running even when your primary cloud is down and out.

Run your active applications across at least two, and preferably three, Availability Zones (AZ) within each region that you use. Yes, three is much harder to do than two, but it's still a heck of a lot easier than trying to build a multi-cloud failover solution.

Run your active applications across at least two, and preferably three, regions. Again, two is much easier than three, but if your mission-critical application is truly mission-critical, it may be worth the trouble. Can't do that? Then see if you can at least afford fast and fully automated regional failover.

Let's face it. The cloud is here to stay. Since that's the case, and clouds will continue to go down, it only makes sense to use the best tools they give us to protect us from their inevitable failures.

We'll still have days when everything goes to hell in a handbasket, but at least there will be fewer of them.

Amazon cloud outage hits major websites, streaming apps

A major outage disrupted Amazon's cloud services on Tuesday, temporarily knocking out streaming platforms Netflix and Disney+, Robinhood, a wide range of apps and Amazon.com Inc's e-commerce website as consumers shopped ahead of Christmas. Read more.

Amazon outage disrupts lives, surprising people about their cloud dependency

When Amazon Web Services was interrupted, some vacuum cleaners, light switches and cat-food dispensers stopped working. Read more.

AWS outage: Your response to AWS going down shouldn't be multi-cloud

Commentary: It's convenient to assume multicloud will solve your application resilience woes. Convenient, but wrong. Here's why. Read more.

Google Cloud fixes outage that hit Home Depot, Snap, Spotify

Google's cloud services, which provide computing power for many major companies' websites, have been fully restored after outages on Tuesday that were caused by a glitch in a network configuration. Read more.

After AWS outage, Larry Ellison says a major customer told him that Oracle's cloud 'never ever goes down'

Oracle's cloud infrastructure service is still way behind Amazon. But that doesn't keep Larry Ellison, Oracle's billionaire co-founder, from taking every available opportunity to tout his cloud over the competition. Read more.

Amazon Web Services (AWS) outage causes chaos

AWS, the largest cloud-computing provider in the U.S., later confirmed that the outages had primarily affected the East Coast region, adding that it had identified the root cause of the problem. Read more.

About the Author
Steven J. Vaughan-Nichols, aka sjvn, has been writing about the intersection of business and technology for over 30 years. He continues to scoop up awards for his valuable insights and practical guidance in highly technical publications, business & technology magazines, and mainstream newspapers.