Intermittent 100% CPU sur all VMs

We’re a small shop, running a Dell T420 (dual CPU, only one present, 6 cores) w/32GB RAM as our main server. Nous avons only 5 VMs, one of which is our WSE 2012 DC.

From time to time, and at a rate for which we’ve not been able to establish a reliable pattern, all of our VMs concurrently spike to 100% CPU. The host remains quiet at 4-5%. A host warm boot doesn’t provide relief, but a cold boot au moins puts things back in the box until le problème reoccurs.

Sometimes we can get a week or more of calm seas out of it; sometimes only a day. An unreliable pattern seems to be that it kicks off sometime during an extended idle period, i.e. overnight. An examination of le serveur’s temperature logs first led us to suspect overheating, but further investigation into recent incidents have spoiled that lead.

We also found descriptions of similar problems on the Dell forums, with claims of resolution by installing the latest round of Dell updates. We recently engaged in a project to do just that (as an aside, it was quite an adventure getting ~700GB of VHDs safely off of and then back onto that machine), but to our utter dismay it didn’t help.

We’re absolutely befuddled. So is Microsoft support (or au moins first tier support is, even though they try not to act like it). I’m including below our SystemInfo output.

Quelqu’un sait-il where to start looking?

Merci

===================================

Host Name: SERVER1
OS Name: Microsoft Hyper-V Server 2012 R2
OS Version: 6.3.9600 N/A Build 9600
OS Manufacturer: Microsoft Corporation
OS Configuration: Standalone Server
OS Build Type: Multiprocessor Free
Registered Owner: Windows User
Registered Organization:
Product ID: 06401-029-0000043-76293
Original Install Date: 4/3/2014, 4:07:15 PM
System Boot Time: 5/4/2014, 1:56:47 PM
System Manufacturer: Dell Inc.
System Model: PowerEdge T420
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 45 Stepping 7 GenuineIntel ~2200 Mhz
[Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20 GHz] (manually added)
BIOS Version: Dell Inc. 2.1.2, 1/20/2014
Windows Directory: C:\Windows
System Directory: C:\Windows\system32
Boot Device: \Device\HarddiskVolume1
System Locale: en-us;English (United States)
Input Locale: en-us;English (United States)
Time Zone: (UTC-09:00) Alaska
Total Physical Memory: 32,723 MB
Available Physical Memory: 12,716 MB
Virtual Memory: Max Size: 37,587 MB
Virtual Memory: Available: 17,129 MB
Virtual Memory: In Use: 20,458 MB
Page File Location(s): C:\pagefile.sys
Domain: OIT
Logon Server: \SERVER1
Hotfix(s): 31 Hotfix(s) Installed.
[01]: KB2843630
[02]: KB2862152
[03]: KB2868626
[04]: KB2876331
[05]: KB2883200
[06]: KB2884846
[07]: KB2887595
[08]: KB2892074
[09]: KB2893294
[10]: KB2894179
[11]: KB2898514
[12]: KB2898871
[13]: KB2901101
[14]: KB2901128
[15]: KB2903939
[16]: KB2904266
[17]: KB2908174
[18]: KB2909210
[19]: KB2911106
[20]: KB2913760
[21]: KB2916036
[22]: KB2917929
[23]: KB2919394
[24]: KB2919442
[25]: KB2922229
[26]: KB2923300
[27]: KB2923768
[28]: KB2928193
[29]: KB2928680
[30]: KB2930275
[31]: KB2939087
Network Card(s): 3 NIC(s) Installed.
[01]: Broadcom NetXtreme Gigabit Ethernet
Connection Name: NIC1
DHCP Enabled: No
IP address(es)
[02]: Broadcom NetXtreme Gigabit Ethernet
Connection Name: NIC2
DHCP Enabled: Yes
DHCP Server: 192.168.1.12
IP address(es)
[01]: 192.168.1.135
[02]: fe80::915b:8de0:712e:29f1
[03]: Hyper-V Virtual Ethernet Adapter
Connection Name: vEthernet (External NIC 1_Internal)
DHCP Enabled: No
IP address(es)
[01]: 192.168.1.11
[02]: fe80::2d35:f582:4958:9eb2
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.

== EDIT ======================

J’ai found la solution to this issue; I waited for over a year to assurez-vous we didn’t encounter any more instances of le problème.

Moderators: J’aimerais to request a reopening of the question, so that I can post the answer.


Source : [Server Fault](http://en.community.dell.com/techcenter/extras/m/white_papers/20161975/download](http://en.community.dell.com/techcenter/extras/m/white_papers/20161975/download)

After over a year of waiting so as to prove la solution as valid, I’m finally able to post this answer.

Dell’s default BIOS settings have C-States enabled, which puts the computer in low-power mode during idle times. This is what causes the VMs to spiral into 100% CPU usage on a Hypervisor host (VMWare, Citrix included).

La solution est to set the System Profile setting in the BIOS to Performance, as opposed to Performance per watt [OS] or Performance per watt [DAPC] (the latter being the default).

The relevant Dell documentation, pp3:

http://en.community.dell.com/techcenter/extras/m/white_papers/20161975/download

And this reply from one of the few Dell support engineers who’s familiar with le problème:

The short version is: C-States disable additional processor cores during idling times. For VMs that are tied to a core (this is OS controlled, I do not believe it’s configurable), this could result in them locking up, as they’re attemping to perform actions with resources that no longer exist in their eyes.

Generally speaking, C-States are generally used on items like backup servers, secondary role servers (Backup dns, dhcp, Domain controllers, etc) so that way the backup servers can remain on, but in a low power mode to save energy.

Addtional Documentation can be found here:

http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface

In a nutshell, power idling on a Dell server should always be turned off (set to Performance) for Hypervisor hosts.

Merci to Eddy Simons at Kitsap Bank for helping me to find this solution.