Internet

Cloud Computing, Part 3: SLA Spirit in the Sky

Part 1 of this three-part series discusses the fuzziness — or the flexibility, depending on one’s point of view — of the definition of “cloud computing.”

Part 2 provides a snapshot of some of the major players and best-of-breed vendors in this fast-growing technology sector.

Google arguably is the granddaddy of cloud computing, at least in its current incarnation. So, whenever the Googleplex develops a new feature or capability — or service offering — the rest of the industry takes notice.

That was the case last month when Google rolled out a service level agreement (SLA) that guarantees 99.9 percent system accessibility for users of its Google Apps Premier Edition — a cloud-based suite that includes business-oriented messaging and collaboration apps, along with support and integration capabilities.

Google had already been offering an SLA for its premium service Gmail; its new announcement was an extension of that agreement to cover Google Calendar, Google Talk, Google Docs and Google Sites as well.

Essentially, the SLA promises that services will be operational at least 99.9 percent of the time during any calendar month. If there’s an outage, affected customers will receive service credits. Google defines “downtime” as a period in which there is a more than a 5 percent user error rate as measured by server side error rate lasting 10 consecutive minutes. Intermittent downtime for a period of less than 10 minutes does not count.

The demand for an SLA is understandable, especially in a new category of computing, said Matt Glotzbach, product management director for Google Enterprise. They can be a security blanket for users new to the cloud.

“A business adopting cloud computing is making a big transition moving from a system that’s under their direct control,” Glotzbach told TechNewsWorld. “It is an emotional as well as operational change, because previously you could touch the servers you were managing.”

The KISS Mandate

Still, Google is keeping its SLAs as pared down as possible by establishing simple definitions for service reliability, Glotzbach said — adding that he hopes the rest of the industry will do the same.

“The customer wants to know if it can access services — not whether the vendor has met the defined throughput or latency figures,” he said. “Businesses do not care about those metrics. If there is a breakdown and the customer couldn’t access a system for hours with no explanation, do you think he would be satisfied with the response by the vendor that ‘well, I did meet latency as we agreed’?”

Indeed, this is the crux of an argument typically made against SLAs: Businesses and their risk managers may love the assurances they offer, but these documents can also be a dodge for vendors that fail to meet the spirit of their promises.

As cloud computing gains more momentum, both users and vendors will be watching closely to see whether providers’ uptime promises are geared more toward the spirit or the letter of the agreement.

All Over the Map

Right now, SLAs for cloud computing are all over the map, Gerry Libertelli, president and CEO of ReadyTechs, a digital infrastructure services firm, told TechNewsWorld. “I have seen companies say they have no SLA — and I have seen a companies go completely overboard.”

Part of the problem is that in the case of cloud computing, SLAs can be particularly difficult to pin down; the very nature of the cloud is redundancy.

“Technically, there should be zero downtime associated with a cloud instance, since almost everything in a cloud — CPU, RAM, disk — is redundant by nature and easily reinstantiated in the case of a failure,” Libertelli pointed out.

There are contingencies specific to cloud computing that must be taken into account, Libertelli continued. ” For example, many Cloud vendors tout zero downtime for cloud configuration changes. This is impossible, since almost all the hypervisors we know of require you to bring a cloud instance down, make the configuration change, and then launch it again. That is not exactly zero downtime, and it really begs the idea of redefining the standard network services SLA to include cloud-specific contingencies.”

In other words, he explained, there’s a big difference between terms like “average configuration downtime” and “network downtime” as opposed to “systems downtime.”

ReadyTechs’ SLA centers on how fast the company will acknowledge a problem and then how fast it will bring resources to bear on an identified issue. This is the most effective way to deal with issues that concern themselves with an SLA, Libertelli said. This is not to be confused with bandwidth SLAs, of course, which can be measured using the 95th percentile and are much easier to assess.

Beyond 99.99 Percent

Perhaps because performance and availability differ with cloud computing, some vendors are taking a more expansive approach with the promises they make.

“The key insight is that in cloud computing, the accountability for operating the system shifts from the customer’s IT department to the vendor,” Intacct SVP Dan Druker told TechNewsWorld. “Mature cloud vendors like Intacct have figured out that service level agreements should guarantee the performance of items that are no longer under the IT department’s control.”

Its SLA includes commitments around system availability, disaster recovery, customer support response time, problem resolution, implementation quality, billing accuracy and roadmap communications.

“We also explicitly guarantee that our clients own their own data, and that if they ever decide to leave us, we will help them to get their data out of our systems,” added Druker. “The system just has to work — and that means much more than a single aspect of operations like availability or support. So it’s very important that buyers of cloud computing solutions really think hard about how the entire system is going to work, and make sure that they are comfortable with the guarantees their vendor is willing to make for them.”

Another factor is that companies themselves are becoming far more sophisticated in terms of expectations and demands. For instance, most clients of Xignite, a financial Web services provider that delivers market data from the cloud, are fine with the 99.5 percent to 99.9 percent uptime it guarantees its customers, CEO Stephane Dubois told TechNewsWorld. Some, though, want as high as 99.99 percent, “which can be a bit tricky.”

The company has not yet committed to that level, he said.

“We are also seeing more customers ask for performance criteria in their SLAs,” Dubois continued. For instance, some of the clients run the application on Facebook, which requires data to move within one-hundredth of a second. Clients are asking for guarantees to meet that requirement.

Performance may be the next focus in cloud computing SLAs, Dubois suggested, though there doesn’t seem to be a huge demand for it yet.

“It’s one thing to have it up all of the time — it is an entirely different thing to have an application up all of the time and it never to be sluggish,” he pointed out.

End-to-end coverage is most important to customers, according to Gary Slater, vice president of network operations and IT at LiveOps. In other words, the customer does not care whether the problem originates from, say, the Verizon communication system or LiveOps call center app.

“They just want their system to work all the time,” Slater said, “just like a utility works all the time. So with our SLAs, we talk about system availability — not component level or individual hardware functionality.”

A New Space

At this point, it is difficult to predict what a standardized SLA for cloud computing will look like — not only because the space is still evolving, but also because all the vendors are approaching it differently.

IBM’s major play is in private clouds, Kelly Sims, director of cloud computing communications for IBM, told TechNewsWorld. Essentially, it converts resources and assets that exist behind a firewall at a client’s site into a cloud infrastructure.

“SLAs don’t apply in these scenarios that much,” Sims noted. “There are few private companies that wish to give up assets to a public infrastructure.”

That said, there are certain customer expectations that IBM is expected to — and does — meet, she continued. “Security is very important, obviously. IBM is working with customers that are extremely large, with sensitive data, and it is very important that we can provide secure connections.”

Cloud Computing, Part 1: Some Breaks in the Fog

Cloud Computing, Part 2: A Who’s Who

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Technewsworld Channels