"To summarize, this test demonstrates that changing the corespersocket configuration of a virtual machine does indeed have an impact on performance in the case when the manually configured virtual NUMA topology does not optimally match the physical NUMA topology"
"Keep your configuration to 1 core per socket UNLESS you need to change because your Windows Server license restricts you to a limited number of sockets."
Just thought it was interesting that Veeam/Microsoft advocate more cores per socket for SQL Express, but this could come at a performance hit on, if you run your VBR server as a VM on a VMware host/cluster.
How big a hit, that would depend on the the underlying hardware on your hosts I guess.
Performance hit is pretty minor to worry about. Personally, I always considered this VMware peculiarity as more of a best practice to keep in mind, and don't use multiple cores per socket UNLESS there is actually a good reason for this (such as software licensing) that outweighs performance degradation.
I've never heard of big impacts created by a dis-alignment of the vNUMA vs pNUMA of the underlying cpu, I always thought honestly that all the improvements made by VMware in regards to NUMA where exactly to better manage this situation.
On the other side, I've heard many customers enjoying improved performance on SQL Express using the simple "trick" of adding cores to the single socket, so to still be under the limits of SQL Express engine, but having multi-threaded processes. I've seen our Solutions Architects suggesting this option multiple times in the field.
Luca Dell'Oca Principal EMEA Cloud Architect @ Veeam Software
1) They use a micro-benchmark designed specifically to stress the NUMA interconnect which is unlikely to match a real world workload.
2) They intentionally create a vNUMA topology that was larger than that of the underlying pNUMA topology. This robs the OS scheduler of all knowledge of the NUMA nodes and thus the proper way to schedule, it simply treats all nodes as equal.
In other words, they compared the following VM configurations:
24 sockets /1 core each
2 sockets /12 cores each
1 socket/24 cores each
But the underlying physical hardware was 4 sockets/12 cores each and, more importantly, the physical server had a NUMA topology that was 6 cores per node, for a total of 8 NUMA nodes. What's specifically missing from the article? A multi-core VM setup with a vNUMA topology that did not exceed the pNUMA topology, in other words:
4 sockets/6 cores each
If they would have tested that VM configuration, they would have still had the optimal vNUMA layout (4 NUMA nodes), that would have still been able to map directly to pNUMA nodes underneath and almost certainly would not have seen any performance degradation, while still providing multiple cores per vCPU.
In other words, while the article could be interpreted as warning against mulitple cores per vCPU, what it's actually warning against is configuring core counts per vCPU that exceed the pNUMA topology. For example, if your pNUMA topology has 4 cores per node, then your best not having more the 4 cores per vCPU, if your pNUMA topoligy has 8 core per node, then 8 cores per vCPU should be fine. The article states this in the recommended practice #2 near the start of the article.
And finally, Veeam recommends increasing core count for SQL Express simply because SQL Express will not use more than a single socket, so adding more sockets simply won't do anything to help SQL Express. However, SQL Express will use additional cores of a single socket, so if you want SQL Express to have access to more CPU horsepower, the only option is to increase the core count, but we don't recommend anywhere to increase the core count beyond 4, because SQL express is also limited to using 4 cores of 1 CPU socket. I haven't seen many NUMA systems that have less than 4 cores per NUMA node so I think we're safe from having any negative performance impact.
dellock6 wrote:I'm not even sure Numa is available for two cores cpu
Sure it is. NUMA is independent of core count and has existed long before multicore CPUs (at least from Intel). For example the old Pentium III Xeon processors supported NUMA on 8-way servers way back in the early 2000's.
It's definitely technically possible to have a NUMA system that uses only 2 core CPUs. For example, I see on eBay all the time Dell R710 (a NUMA architecture) with 2x dual-core E5502 processors so you would end up with two NUMA nodes of two cores each. Whether you will see any of those in the real world is an entirely different matter, but they certainly can exist.