I saw this tweet from the always awesome @DBArgenis and @sqldbawithbeard and couldn't help but feel a bead of sweat track down my neck because I like them, but I don't like their best practice

IMHO the proper use of a best practice is as a talking point for discussion (something which Twitter is admittedly not built for). In contrast though the most common use of a best practice is to shortcut discussion by replacing it with the one immutable rule.


Obey.

I get it. It could save you time troubleshooting an awful issue. And there are bugs in BIOS power management, VMware power management, and even Windows power management.

Besides you're paying per core for SQL Server (well, not me personally) and the costs are astronomical! Why wouldn't you want to squeeze the best absolutely tip top performance ever available from it? If you have a couple servers just go for it.

But what if you have more than a couple servers?

A meta-review of the literature

I think it's fair to start with KB2207548 which really just points to KB2534356 from 5 years ago which talks about power management bugs in Windows Server 2008 and R2; specifically that CPU cores get parked (this page has a good description) and never un-parked.

  • There's no fix for Windows Server 2008 except turning off power management.
  • A hotfix is available for Windows Server 2008 R2 SP1.

That sounds serious. But:

  • Aren't your Windows Server 2008 R2 servers already patched? If you haven't applied the hotfix then an out-of-the-box clean install today with all of the security updates takes you to ntoskrnl.exe 6.1.7601.23539 and the article only takes it up to 6.1.7601.21702. There have been > 25 ntoskrnl.exe updates since it was published. I couldn't find anything to prove it's not already out there.

  • What about Windows Server 2012 and 2012 R2 released 4 years ago now? Do they still have this specific problem? Probably not.

We're not off to a strong start.

But wait, VMware said…

That's a longer story. In 2012 it was:

Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management, with little or no performance impact. Therefore, if disabling power management does not realize any increased performance, VMware recommends that power management be re-enabled to reduce power consumption.

In 2013 it continued:

"At every load level, ESXi Balanced was about 3% more energy efficient than the Performance setting, despite the fact that it delivered an equivalent score to Performance."

"Although the Performance and ESXi Balanced settings performed very similarly under load, hosts using ESXi Balanced and BIOS Balanced power management consumed 33% less power while idle."

Even the VMware vSphere 5.5 Performance Study used Windows Server 2008 R2 and does not mention having applied the above Microsoft patch or not. It concluded:

Configure your BIOS settings to allow ESXi the most flexibility in using the power management features offered by your hardware. Select the OS Control mode under the BIOS power management options. Enable all C-states and the deepest C-state in the BIOS.

To achieve the best performance per watt for most workloads, leave the power policy setting at the default, which is Balanced.

For applications that require maximum performance, switch the power policy to High Performance. This includes latency-sensitive applications that must execute within strict constraints on response time. For maximum power savings, switch the power policy to Low Power.

In 2014 they published their white paper on Performance and Scalability of Microsoft SQL Server on VMware vSphere 5.5 using Windows Server 2012 R2. Unfortunately they also forced the power management settings all to High Performance and did not do a comparison. However they did note:

Latency Sensitivity is an advanced setting designed for high performance computing applications like financial workloads. Database virtual machines in the test configuration did not benefit from being treated as highly latency sensitive.

And now in 2016 this kind of changes because we have Architecting Microsoft SQL Server on VMware vSphere where they said:

SQL Server is not usually considered a "latency sensitive" application.

But…

For critical applications such as SQL Server, the default power scheme in vSphere 6.0 is not recommended.

Okay…

Microsoft recommends the high performance power management policy for applications requiring stability and performance.

However…

Lower-tier SQL Server workloads typically are less latency sensitive, so in general the goal is to maximize use of system resources and achieve higher consolidation ratios rather than maximize performance.

And…

The vSphere CPU scheduler’s policy is tuned to balance between maximum throughput and fairness between virtual machines. For lower-tier databases, a reasonable CPU over commitment can increase overall system throughput, maximize license savings, and continue to maintain adequate performance.

What I found interesting was a random post written by what appears to be a VMware employee or associate in much more detail…

None, we don't pass through any control over P-States to the guest. The only policy that does affect a VM (on Windows) is "Processor Idle Disable" (advanced modifier), this will basically cause the Guest OS to not execute halt (C1), it will just run (C0) even if there is nothing to do (basically a tight loop checking very very very fast whether there is any work to do, like a queued up DPC).

While this can help for 0.01% percent of latency sensitive workloads, it will also consume 100% a pCPU for each vCPU the VM has.


Hmmm.

Back to the decision at hand

Now while I want to go to the server administrators and ask them to enable High Performance in VMware and Windows Server across the board just so that I can tick that best practice box…

  • Am I convinced?

    No. It seems obvious that the high performance settings are essential on an extremely old and un-patched OS. Beyond this there seems to be a lot of FUD, however, it's all up for debate due to a lack of facts and Microsoft not keeping their documentation up to date.

  • Am I burdening others?

    Let's say that I let the other administrators make up their own minds about the risks vs the rewards. The only way this is going to fly is if I cherry pick the very latest article and leave out all the others. I'd feel guilty.

  • Am I being a good data centre citizen?

    SQL Server will be sitting amongst racks of ESX hosts running other important workloads on overcommitted cores. Am I really going to start turning this on every Dev, Test, QA, Secondary, and SSRS instances? There's very little data on the impact this has to other VMs.

  • And that's not even mentioning the environment.

    Am I happy to turn the lights off at home to save 10c/year when I'm then pulling 33% more power across dozens/hundreds of servers in the DC 24x7x365 for the next few years?

I really hate to say it depends and I really want to join the club and just say, "This fits most scenarios". But SQL Server power management feels a lot like cloud think and I'm not convinced it's always beneficial, especially on a lot of servers, or on a mix of servers running in the same racks.

What even makes it a best practice? It's questionable at small scales, it's questionable at large scales, and to me it seems like it's only needed in specific scenarios that need absolute performance no matter what. That's probably not the most common use case.

To quote James Bach: