_ Disclaimer: This blog has been created to the best of my knowledge and understanding. If I need to correct a statement, I will mark that accordingly in a future update. _

_ Credits: I would like to thank our colleagues from the VCF BU for their support and patience with me, Ellen Mai from the PM team and especially Ross Knippel from the development side. Also, thanks to Thomas Thimm, Pascal Hanke, and Michael Hempel for testing things out. _

** Update 2024-06-09: Ross came back and improved my wording to make it clearer; I applied the changes **

I am starting the blog with a simple question from one of my customers:

How do you explain that the VM is showing an EVC CPU mode that exceeds the capabilities of the current CPU architecture?

In this case, a VM running on a host with a Cascade Lake processor was showing an EVC CPU mode of “Ice Lake”. Neither the cluster nor the VM had any EVC mode defined, and since “Ice Lake” is a newer generation, it didn’t make any sense.

Image Caption
An example of the VM CPU EVC mode

This question led me deep down the rabbit hole, and since I found that I wasn’t alone with this problem, I decided to share what I learned.

Enhanced vMotion Compatibility

Most of you will be familiar with Enhanced vMotion Compatibility (EVC) and (hopefully) use it by default when setting up new vSphere clusters.
Just to recap EVC with a crude summary:

EVC can reduce the exposed CPU instruction set of newer processors and present to a virtual machine only what was selected in the compatibility level. Or even simpler, older and newer CPU generations operate on the greatest common denominator for everyone. It’s an easy way to ensure smooth migrations and hardware refreshers without impacting your workloads with downtime. In my experience, the impact of not having the latest and greatest CPU instructions available is negligible for 99% of workloads.

In case you need a bigger refresher on EVC, check out this blog from Niels Hagoort.

Back to the problem

As outlined, EVC works by stripping down CPU instructions of newer generations. As you would think, you cannot simply add newer features to older CPU hardware.
So a Cascade Lake processor would max out on “Cascade Lake” EVC mode in terms of what a VM can consume in features - or so I used to think.

In our case, we tested with an Intel Xeon Platinum 8260L which was released in 2019.

The first thing we noticed was that there is a difference in the way the VM was created.

Image Caption
CPU EVC Mode: What a difference a feature makes

Creating a VM manually first showed the expected results:
“Cascade Lake” EVC on a Cascade Lake processor.

What is the difference with the other VM? Only when checking the VMX file, we found that VHV was enabled.

For reference, VHV is also known as “nested virtualization” or as the UI calls it Expose hardware-assisted virtualization to the guest OS (You can check this on a VM if you go to the settings -> Virtual Hardware -> CPU).

How are the processor features for a VM computed

I take this passage from an explanation Ross gave me (updated on 2024-07-09):

The VM requirements are computed at power-on by taking the host’s CPUID and MSRs (machine-specific registers), adding in features that are off by default (eg. vhv) but requested (via guestOS type or explicit .vmx option), and then removing features that are:

  • Not supported by the ESX version being run on the host
  • Not supported by the virtual hardware version of the VM
  • Not in the cluster EVC mode
  • Not in the per-VM EVC mode

The final set of features needed for the power-on is then sent to vCenter. At this point, the EVC mode for the VM is computed by doing a best fit (lowest EVC mode that supports all of the features required by the VM).

The gotcha

The last bit is the one that stood out because, as this specific example shows:

  • The EVC mode for a processor generation is close to the capabilities of the physical CPU, but it does not mean or guarantee that it is an identical representation.
  • Newer virtual hardware versions may expose more CPU features to the VM.

If you are curious, Ross gave me the answer to what was the difference for this particular example:

The feature vt.zeroinstlen (remark: “Zero instruction length vmentry”, part of Intel Virtualization Technology) is not included in the “Cascade Lake” EVC mode.

Now, the hardware version comes into play. The vt.zeroinstlen was just introduced with virtual hardware version 17.

Conclusion

If you are not aware of the implications around vHW version and EVC modes, this example serves to show how a minor setting could cause a potential downtime to workloads for any migration scenarios in the future. Have a look at KB 318962 as well.

To be on the safe side, I recommend running your clusters with EVC mode turned on (or per-VM EVC mode for that matter) as it gives you predicable results at no negative impact for the bulk of workloads. Changing the EVC mode on a cluster can be done at runtime, but only if you do not have features exposed to the VMs which exceed the compatible EVC mode.