Data center Quality of Service (QoS) and Total Cost of Ownership (TCO) has a major impact on an enterprise’s business growth and profits. IT departments and application developers seek architectures and deployments providing high service availability and resilient performance scalability to meet rising service demand while controlling capital and operating expenses.
Tremendous technology advances have been made in recent years. Commodity hardware trends in high-density computing exploiting multi-core processors and flash memory offer large potential improvements in data center performance and reduce power and space consumption. Emerging software technologies provide advanced resource management and high availability (HA), offering large improvements in service availability, scalability and cost structure.
These technology trends can be fused together to achieve major improvements in data center QoS and TCO. This article focuses on that fusion: the motivation, architectural approach and resulting benefits.
The-High Availability Imperative
High service availability is critical for all production deployments. In order to achieve high service availability, deployment architectures require service continuity when servers fail, during routine administration (including hardware and software upgrades, back-ups, etc.) and when disasters take out an entire data center.
With the explosive growth in online data, IT managers are finding it increasingly difficult to meet the accelerating demands for service availability and performance scalability.
Front-End High Density and High Availability through Virtualization and Partitioning
With server virtualization, application instances are provisioned into virtual machines under the management of a hypervisor, which manages their execution on multiple cores within a server. This is an easy way to combine existing applications within a multi-core system to exploit high-density computing.
It also provides elastic service capacity by dynamically provisioning more or fewer application instances across a pool of commodity servers as workload demand changes. To achieve high service availability in this deployment model, a client-side time-out detects a failed instance, reconnects, and is redirected in a load-balanced manner to an identical instance which retrieves state from a reliable, persistent data store.
This approach to high-density, high-availability computing though virtualization works well for applications like Web servers, which have been designed to scale horizontally, can execute effectively within the DRAM capacity of a commodity server, and are stateless.
The Limits of Virtualization and Partitioning
In scaled production workloads, large amounts of shared, persistent state and data are typically stored in databases and database caches. These databases concurrently handle the requests of hundreds or thousands of Web application connections, requiring the entire DRAM and disk I/O of a dedicated physical server. Multiple instances cannot share a server and yield good performance.
Virtualized databases can only be effectively used when the data set is a fraction of the DRAM size of a server (typically 32 GB). This limits the applicability to development or small-scale production databases. Using virtualization for larger-scale databases requires that the database be broken into small partitions (shards), with the working set of each small partition fitting in the DRAM of a virtualized database instance on a server. Moving databases to virtualized instances or highly partitioned databases introduces a range of compatibility, performance and availability problems, resulting in reduced quality of service and higher cost.
Emerging High Density Databases
Production SQL ACID databases executing on native servers have been typically limited by disc I/O, preventing them from exploiting the multiple cores in a commodity server and forcing partitioning.
Emerging databases that are designed with high parallelism, granular concurrency control, and intelligent storage hierarchy management between DRAM and flash memory achieve a tenfold improvement in throughput/watt/cm3 when compared with legacy databases on hard-drive-based systems. They achieve even greater computational density, consolidation and scaling compared with virtualized or highly partitioned databases.
This can enable much larger database partitions with much higher transactional workloads and reduce the capital and operating cost for equipment, power, space and supporting infrastructure.
Availability Problems With Commodity Databases
Failure recovery for virtualized database instances, highly partitioned databases and traditional SQL ACID databases has been typically implemented using asynchronous data replication. Unfortunately, asynchronous replication provides suboptimal service availability, data integrity, administration, performance and cost of ownership.
- Reduced Service Availability: When a master fails, fail-over to a replica is stalled until all in-flight transactions have been committed and a new master established.
- Reduced Data Integrity: Replicas lag the master in reading and applying changes, resulting in replicas giving old data while servicing read transactions; this lag can be arbitrarily long. Data can be lost when the master fails.
- High Administrative Complexity: Master failure recovery, hardware and software upgrades, slave migrations, additions, etc., require complex, error-prone processes.
- Poor Performance: Applying committed transactions on replicas is typically single-threaded to provide serial consistency, resulting in low utilization on the replica. Master throughput must be limited to match the replicas’ performance, resulting in low master utilization, forcing database partitioning.
- High Cost: Low master and replica utilization, lost revenue due to downtime and increased administrative complexity increase capital and operating expenses and reduce revenue.
Using Synchronous Replication to Exploit Multi-Core Servers and Get High Availability
Multi-threaded synchronous replication can significantly reduce service downtime and administrative complexity, provide full data consistency with no data loss, and enable full utilization of servers. With synchronous replication, when the master commits a transaction, all replicas are guaranteed to have received and committed the update.
Upon detection of a master failure, a replica is automatically promoted to become the new master, and the client applications are automatically switched to the newly promoted replica without any service interruption or data loss. If a replica fails, its load is automatically switched and load balanced to surviving nodes.
To achieve these benefits with high performance, the software implementing synchronous replication must be optimized for high thread parallelism and granular concurrency control. Multi-core thread parallelism is used to concurrently communicate, replicate and apply master update transactions on all replicas with extremely high throughput and low latency.
- Improved Service Availability: Synchronous replication reduces downtime by over 90 percent from asynchronous replication due to fast automated fail-over and recovery, on-line upgrades, etc.
- Improved Data Integrity: Replicas always have the latest committed data from the master, so there is no data loss if the master fails. Reads on replicas always provide the latest committed data, resulting in cluster-wide full data consistency.
- Reduced Administrative Complexity: Fail-over and recovery are completely automatic and instant, requiring no administrator intervention.
- High Performance: High thread and core parallelism and granular currency control during both transaction execution and during cluster-wide synchronous replication provide highly efficient master and slave utilization and excellent overall performance.
- Lower Cost: Due to scaling and execution efficiency, coupled with increased revenue due to reduced downtime, there is a significant cost reduction.
Take-Aways
Data centers can fuse together the benefits of commodity high density computing and architectural improvements in high-availability solutions.
For the Web and application tiers, IT managers of data centers and clouds can use virtualized machine instances with simple timeouts and load balancing to achieve high-density and high availability.
For the data tier, non-virtualized, vertically scaling database solutions with synchronous replication which are designed to effectively exploit commodity multi-core processors and flash memory can be applied to achieve computationally dense high-performance and high availability in a cost efficient manner.
Database solutions employing thread parallel synchronous replication within data center database clusters can be utilized to achieve fully consistent data without data loss and with high sustained performance.
Synchronous replication provides high service availability and reduced administrative complexity based on instantaneous, automatic fail-over with on-line scaling and upgrades. These approaches can yield major improvements in data center QOS and TCO for scaled production services.