Windows OS High Availability (HA) is a tricky part of your disaster recovery (DR) planning. Many Microsoft and third-Party server tools and platforms come with some form of native HA toolset, and for the larger portion of these, Microsoft Cluster or Failover Clustering (Server 2003 and 2008, respectively) are the tool of choice. Enterprises may be surprised to find, though, that clustering alone is not sufficient to meet the needs of enterprise DR when you really look at it.
Don’t misread this article; clustering technologies in Windows are a huge benefit to nearly every organization that can take advantage of them, but they should not be the sole methodology of HA for your company. This is due to two limitations of clustering that would make it difficult for you to survive with just this technology alone.
First is the fact that many software packages cannot take advantage of clustering at all, and even if they can, your organization may not be able to leverage the platform. Secondly, while clustering excels at local HA, true DR planning must take into consideration what happens if the production site is lost. Clustering can be used across a WAN, but it may pose challenges that your organization cannot overcome.
Who Doesn’t Love Clustering? Lots of Vendors
When Microsoft introduced clustering in Windows NT 4 some years back, technology manufacturers could allow for two Windows servers to act in tandem via a shared disk device. If one server failed, the other could bring up the same application resources and act on the same dataset. Over time, Microsoft refined and extended this clustering technology, and it continues to play a major role even today in Windows Server 2008 Failover Clustering (FC). Hundreds of different software packages from all forms of application vendors have written the necessary code to allow their software to sit on top of the cluster and therefore gain some measure of HA.
Even Microsoft’s own HA technologies in SQL and (to a lesser extent) Exchange 2010 leverage this technology, although it is often hidden from the end-user within the SQL or Exchange system itself.
The problem is that a much greater number of vendors’ software packages do not support clustering, which would leave you without a suitable HA platform for these applications. Some vendors expressly prohibit their software from being installed on a server where clustering services have been configured, prohibiting you from installing these software tools even as a Generic Cluster Resource.
Also, keep in mind that clustering itself requires specific hardware and software to be in place before you can install a clustered configuration. In traditional Windows clustering, you will need shared disk resources that meet Microsoft’s Hardware Compatibility List (HCL) specifications for the version of Windows you are running. Most modern SAN systems can easily support clustering, but if you are not yet running a SAN, this could be a high financial hurdle to jump. Of course, only the Enterprise versions of the Windows Operating Systems can support clustering, and therefore if you are running the Standard or lower versions of the OS, you will need to upgrade before attempting to set up a cluster. Finally, most vendors’ software systems require a different license for use on clusters than on non-clustered servers.
A Hybrid Approach
Once you have overcome these restrictions, you can configure and maintain your cluster to provide HA within a single site or within a shared-LAN campus environment. Unless you are on Server 2008 or higher, clusters are limited to a single IP subnet for all nodes and resources, which means that if you want to provide cross-site failover or other DR plans, you will need to first stretch your subnets between sites.
In Server 2008, Microsoft introduced cross-subnet capabilities into FC, but still requires either a shared-disk resource or some technology for integrated replication of data from one node to another. This means you must do one of two things to ensure that a cluster can survive between subnet-disjointed sites: provide a disk-extension system so that both sides can see the same disk, or else use an acceptable replication toolset to provide duplication of data to both sites.
Disk extension provides the cluster capability across sites, but still represents a single point of failure if the primary site (and the disk contained there) is lost. Extension technologies also require a large bandwidth system between the two sites in order to ensure that latency doesn’t exceed recommended maximums and create slow-downs on the production systems during normal operation.
Replication provides another, independent, copy of the data at the secondary site, but must be integrated with the cluster completely if you wish to use clustering as the sole method of HA for your environment. There are several vendors that can provide this, but far fewer than the number of tools available to replicate from a cluster to either another cluster or a stand-alone system at the secondary site. Using this type of hybrid methodology allows you to overcome the limitations of clustering, but not abandon it entirely. You can set up cluster systems at your production site with shared disk storage and automatic failover control, but still replicate data and other information to a secondary site with an automated methodology for resuming services if the entire cluster should fail.
Many modern organizations are looking to clustering to provide HA for their environments. These services are an integral — and often integrated — failover solution if you have the hardware and licensing to support them. They do not, however, constitute a full DR plan, and you should be completely evaluating each application to determine if other tools are needed. In most cases, a combination of tools can provide a complete DR solution for any-size organization.
Mike Talon is an enterprise systems engineer at Double-Take Software.