VMware CPU Ready States and Exchange 2013

In chapter one of “Microsoft Exchange 2013: Design, Deploy and Deliver an Enterprise Messaging Solution”, we talk about constraints that may be forced upon us when designing Exchange. One of these constraints may be that we must use either existing hardware or the incumbent virtualization solution. Existing hardware can be a bear of an issue, since if the sizing doesn't fit the hardware, then you don’t really have an Exchange deployment project anymore.

However, virtualization carries with it the promise of over committing memory, disk and CPU resources, which are features deployed by most customers taking advantage of virtualization technologies. Note that over committing anything in your virtualization platform when deploying Exchange is not only a bad idea, it’s an outage waiting to happen.

Virtualization is not free when it comes to the conversion of physical hardware to emulated virtual hardware, and the figures vary between vendors, however you may be looking at a net loss in the range of 5-12 percent across the entire guest’s performance. Coming back to constraints, let us assume your customer – or your company – requires you to virtualize and use VMware as the chosen hypervisor.

Once you've taken into account that you’re virtualizing, you then need to size your guest, as if you’re sizing the real world equivalent of the server. Let’s assume that for arguments sake you end up requiring four cores per server, but you allocated eight, since more cores never hurt anyone, right?

You read the prevailing guidance carefully so you decide to use an existing blade with eight cores, and allocate another eight cores to Exchange, bearing in mind that you've already allocated eight cores to two other applications on the same blade. No fuss, you may think, the guidance states that you should allocate no more than two virtual cores, per physical core. Since you’re a conscientious SysAdmin, you've benchmarked CPU usage on the VMware host and decided that the values are acceptable.

Now it turns out that for some reason Exchange seems to run non-optimally. You decide then to move Exchange to another blade with more CPU's and double the core count within the guest from eight to 16 CPU's, since more CPU's never hurt anyone, right?

Turns out that the expectation for a linear increase in performance is not fulfilled…where do you turn to next?

A good place to start may have been the vendors specific guidance pertaining to Exchange, in this case VMware supplies the Exchange 2010 best practices guide, which states (emphasis added)

Consequently, VMware recommends the following practices:

  • Only allocate multiple vCPUs to a virtual machine if the anticipated Exchange workload can truly take advantage of all the vCPUs.
  • If the exact workload is not known, size the virtual machine with a smaller number of vCPUs initially and increase the number later if necessary.
  • For performance-critical Exchange virtual machines (production systems), the total number of vCPUs assigned to all the virtual machines should be equal to or less than the total number of cores on the ESXi host machine.

While larger virtual machines are possible in vSphere, VMware recommends reducing the number of virtual CPUs if monitoring of the actual workload shows that the Exchange application is not benefitting from the increased virtual CPUs. For more background information, see the “ESXi CPU Considerations” section in the white paper Performance Best Practices for VMware vSphere 5 (http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf).

Before “consequently”, the guide briefly introduces VMware's Virtual Symmetric Multi-Processing model, as well as detailing a wait state known as “ready time”. Ready time is the metric revealing why your Exchange workloads in VMware are not benefiting from more processors, assuming that “Ready Time” is consistently high (more than 5%).

The consequence of throwing more vCPU's at a guest than required is that the guest spends more time in “ready time” than is required, as the hypervisor waits for ALL underlying cores which it believes are available to the guest to become available to execute instructions. In other words, the guest OS is ready to process instructions on the processor, however the hypervisor is forcing the guest to wait until all the physical cores are available. This state becomes much worse, as the ratio of vCPU's to physical CPU's increases.

In several chapters we make reference to the Windows Server Virtualization Validation Program (SVVP) and guide you to make sure that your chosen virtualization platform is listed and supported.

VMWare is listed as supported for multiple versions of Exchange and Windows Operating Systems however your server performance is still bad. Does that mean it’s a bad hypervisor?

Well, no.

The point is that VMWare is not a bad hypervisor but not understanding how VMWare allocates CPU resources, as well as not following VMWare's guidance will result in poor performance for your Exchange servers.

Had you followed the guidance, you would have chosen to start with fewer CPUs (you needed four) instead of more CPUs, and following VMware guidance (reducing the number of cores), you would have ensured that you would have allocated one vCPU per physical CPU, thus leading to gratifyingly low ready states.

This is another stark reminder of the fact that we need to read relevant documentation as part of our planning process. All relevant documentation, not just that of the new software we are planning to use.

Nicolas Blank has more than 15 years of experience with various versions of Exchange, and is the founder of and Messaging Architect at NBConsult. A recipient of the MVP award for Exchange since 2007, Nicolas is a Microsoft Certified Master in Exchange and presents regularly at conferences in the U.S., Europe, and Africa.

Nicolas will be running a two day Mimecast Exchange’ training event on the 31st of October and the 1st of November at Microsoft’s Cardinal Place in London. For your opportunity to win a place at the event, please read this blog post about the event

(Image courtesy of Random Tony)