Hello all,
we have a backup for 2 Hyper-V failover clusters. A few weeks ago, the backups are started to fail with random errors but all of them (at least it seems) related to the WMI:
Some examples:
11/21/2024 1:52:09 PM :: Error: Failed to create Hyper-V Cluster Wmi utils: Failed to execute WMI query 'SELECT * FROM Win32_OperatingSystem'.
11/21/2024 1:52:16 PM :: Error: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))
This usually means a permission issue but the account that is used to authenticate works fine. Mainly because once the first run of the job for 49 VMs in this example starts, approximately half of the VMs are backed up, the other half fails.
Tried to narrow down the issue to specific hyper-v node but it happens randomly across all of them. If we debug the WMI from the backup server, the connection to one of the nodes works without any issues.
Eventually on second/third retry, everything is backed up. But there are exceptions sometimes.
There was a support case (Case #07484767), which is already closed, since this is most likely an issue with the HyperV infrastructure.
If we examine the WMI event logs, there are so many errors such as:
Id = {0FED6E9B-37A8-001A-4D70-ED0FA837DB01}; ClientMachine = HYPERVNODEHOSTNAME; User = DOMAIN\svc-account; ClientProcessId = 6556; Component = Unknown; Operation = Start IWbemServices::GetObject - root\virtualization\v2 : Msvm_StorageJob.InstanceID="6C303D32-B60B-4A3F-A4E3-2C53146F23CF"; ResultCode = 0x80041002; PossibleCause = Unknown
Id = {0FED6E9B-37A8-001A-4D70-ED0FA837DB01}; ClientMachine = HYPERVNODEHOSTNAME; User = DOMAIN\svc-account; ClientProcessId = 6556; Component = Unknown; Operation = Start IWbemServices::ExecNotificationQuery - root\virtualization\v2 : SELECT * FROM __InstanceDeletionEvent WITHIN 2 WHERE TargetInstance ISA 'Msvm_StorageJob'; ResultCode = 0x80041032; PossibleCause = Unknown
Just trying my luck here if anyone has come across this issue. I sense there is some really stupid solution we overlooked as it usually is in these cases.
Interesting is that we had this issue in the past. Then we did a Veeam DB migration from MSSQL to Postgres and these issues went away. Now they are back. I do not think there is any link, just coincidence.
Thank you
-
- Influencer
- Posts: 10
- Liked: 1 time
- Joined: Jan 22, 2024 2:27 pm
- Contact:
-
- VP, Product Management
- Posts: 7202
- Liked: 1547 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Failed to execute WMI query
I had similar issues once with a Hyper-V cluster where the backend was under heavy load caused by some storage specific issues within storage spaces direct. In the end we rebooted the cluster nodes (one by one, moving workloads before) and modified data flow within the cluster for background streams of the cluster.
At the time of heavy IO load (disk performance test software within VMs) or backup, we could see that the host became unresponsive to Veeam and to Hyper-V management software.
At the time of heavy IO load (disk performance test software within VMs) or backup, we could see that the host became unresponsive to Veeam and to Hyper-V management software.
-
- Influencer
- Posts: 10
- Liked: 1 time
- Joined: Jan 22, 2024 2:27 pm
- Contact:
Re: Failed to execute WMI query
Hello,
thank you for your reply. The IO was not an issue. The whole infrastructure is running idle during the backup window. We did "exclusion method" tests to narrow down the source of the issue. Such as:
- New VBR server directly on site, where the HyperV infra resides (VBR is provisioned in Azure) -> No WMI issues
- New VBR server in the same subnet as the current VBR (with the same firewall rules and network paths) -> No WMI issues
Based on those tests we figured the issue is either with the VM or with the database (which is weird). Since we got rid of every AV, disabled FW, rebooted, etc... and nothing worked. We basically installed a new VM with VBR, restored config and for now there are no issues.
It hurts not knowing the issue but sometimes it is just easier this way..
thank you for your reply. The IO was not an issue. The whole infrastructure is running idle during the backup window. We did "exclusion method" tests to narrow down the source of the issue. Such as:
- New VBR server directly on site, where the HyperV infra resides (VBR is provisioned in Azure) -> No WMI issues
- New VBR server in the same subnet as the current VBR (with the same firewall rules and network paths) -> No WMI issues
Based on those tests we figured the issue is either with the VM or with the database (which is weird). Since we got rid of every AV, disabled FW, rebooted, etc... and nothing worked. We basically installed a new VM with VBR, restored config and for now there are no issues.
It hurts not knowing the issue but sometimes it is just easier this way..
Who is online
Users browsing this forum: Bing [Bot], VaibhavS and 22 guests