I have a problem with our vCenter appliance. The only thing that changed in our VMware environment in the last months was the update of Veeam from 8 to 9. So I suspect Veeam might be the reason for our problems. Maybe I'm wrong, but here's the story:
We have a small VMware installation with 2 ESX hosts (VMware 5.1 and 5.5) and 5 VMs (3 on ESX1 and 2 on ESX2). Veeam is running on a physically installed 3rd server. Both ESX have local RAID5 storage. There's more than 1TB free space in each datastore and more than 20GB unused RAM in each of the ESX.
A few weeks ago suddenly everything on ESX1 (the one the vCenter appliance is running on) was extremly slow. It took users up to 15 minutes to logon to their PCs (because of the server based profiles), Veeam reported timeout errors, everything was plain slow. I checked the performance graphs on the ESX with the vSphere Client and found a quite high "hdd latency" with values between 300 and 400ms. All other parameters (CPU, memory, read, write, ...) showed more or less normal values. I found the same high hdd latency for the host itself and for the vCenter appliance. So I shut down the vCenter and immediately the hdd latency for the host went down to nearly 0 and everything was back to normal speed!
After I restarted the vCenter everything was slow again. So I did some research on Google, but I didn't find anything relevant to our problem. I decided to take the easy approach, removed the 2 ESX from the vCenter, deleted the whole vCenter appliance, downloaded the newest vCenter appliance 5.5u3 from VMware (we're not on VMware 6 yet), installed it and everything was running nice and fast. So I was happy, removed the existing jobs from Veeam, recreated them with the new vCenter and ran the initial backups.
A few days later the problem was back and everything was slow again
I SSH'd into the vCenter and checked the free space on the several partitions with the "df" command: Lot's of free space. I tried to interpret the vpxd.log, but that's beyond my VMware knowledge. I changed the memory assignment for the vCenter appliance from the default 8GB to 10GB. That changed nothing. After that I fixed both the "reservation" and the "limit" for memory on the "resources" tab of the properties of the vCenter appliance to 10GB. After that everything was back to normal speed! No high hdd latency, everything fine! Until the next morning: High hdd latency, slow speed
I have no ideas what's going on with my vCenter. As I wrote before, the problems began after I updated Veeam to version 9. I did this because I wanted to take advantage of the improvements in Oracle database backup, because we're running Oracle in one of our VMs. So, is Veeam doing something with the vCenter? Do I have a Veeam problem? Or is this all complete coincidence and this has nothing to do with Veeam? Should I open a case with Veeam support?
Any ideas anyone? I'm out of ideas!
Thanks a lot!