-
- Influencer
- Posts: 12
- Liked: 2 times
- Joined: Nov 09, 2017 6:15 am
- Full Name: wandersick
- Contact:
Veeam VM Backup Job (of Exchange 2016) Crashes Hyper-V 2016 Host
Hi,
Our Veeam 9.5 U4b VM backup job, which contains a Exchange 2016 DAG node VM (and several other VMs), often crashes our Hyper-V 2016 host when it runs.
We tried killing the process (vmsp.exe) of the Exchange VM but it cannot be killed. The Hyper-V host needs to be forcefully rebooted to resume the service of
Exchange, but that would affect other VMs running on the host. This is undesirable.
Support case ID: 04099098, referring to 04083652
We have already attempted the below:
- Making sure the Exchange DAG VMs are backed up one by one in the backup job (by setting concurrent task limit of Backup Repository to 1)
- Performing production snapshot manually on the Exchange VM which is successful
- Lessening the failover sensitivity of Microsoft Failover Cluster according to the Veeam Blog post
- Using a cluster of SSDs for our servers where disk I/O would not be an issue. Task manager shows sufficient free system resources during backup
Any suggestions would be much appreciated. Thanks!
--
Below please find the error messages in Veeam for the failed Exchange VM in the failed backup job. (These are the ones we have observed so far):
Occurance 1
Failed to create VM recovery checkpoint (mode: Veeam application-aware processing) Details: Error code: 0x80041033 Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to create VM recovery snapshot, VM ID 'b4116e8a-470a-421a-92de-2cf0aafccf7b'.
Retrying snapshot creation attempt (Failed to create production checkpoint.)
Task has been rescheduled
Unable to allocate processing resources. Error: Error code: 0x80041033 Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to create VM recovery snapshot, VM ID 'b4116e8a-470a-421a-92de-2cf0aafccf7b'.
Occurance 2
Failed to create VM recovery checkpoint (mode: Veeam application-aware processing) Details: No Decoupled Providers Attached Error code: 0x80041002
Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.lnstancelD="889CAFA0-5D80-...".
Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.lnstancelD="889CAFA0-5D80-...".
Failed to create VM recovery snapshot, VM ID 'b4116e8a-...'.
Our Veeam 9.5 U4b VM backup job, which contains a Exchange 2016 DAG node VM (and several other VMs), often crashes our Hyper-V 2016 host when it runs.
We tried killing the process (vmsp.exe) of the Exchange VM but it cannot be killed. The Hyper-V host needs to be forcefully rebooted to resume the service of
Exchange, but that would affect other VMs running on the host. This is undesirable.
Support case ID: 04099098, referring to 04083652
We have already attempted the below:
- Making sure the Exchange DAG VMs are backed up one by one in the backup job (by setting concurrent task limit of Backup Repository to 1)
- Performing production snapshot manually on the Exchange VM which is successful
- Lessening the failover sensitivity of Microsoft Failover Cluster according to the Veeam Blog post
- Using a cluster of SSDs for our servers where disk I/O would not be an issue. Task manager shows sufficient free system resources during backup
Any suggestions would be much appreciated. Thanks!
--
Below please find the error messages in Veeam for the failed Exchange VM in the failed backup job. (These are the ones we have observed so far):
Occurance 1
Failed to create VM recovery checkpoint (mode: Veeam application-aware processing) Details: Error code: 0x80041033 Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to create VM recovery snapshot, VM ID 'b4116e8a-470a-421a-92de-2cf0aafccf7b'.
Retrying snapshot creation attempt (Failed to create production checkpoint.)
Task has been rescheduled
Unable to allocate processing resources. Error: Error code: 0x80041033 Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.InstanceID="4CC513EC-AB31-4D17-B671-394A00F87066"'. Failed to create VM recovery snapshot, VM ID 'b4116e8a-470a-421a-92de-2cf0aafccf7b'.
Occurance 2
Failed to create VM recovery checkpoint (mode: Veeam application-aware processing) Details: No Decoupled Providers Attached Error code: 0x80041002
Failed to get object '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.lnstancelD="889CAFA0-5D80-...".
Failed to get wmi object by path: '\\HYPER-V-HOST\root\virtualization\v2:Msvm_ConcreteJob.lnstancelD="889CAFA0-5D80-...".
Failed to create VM recovery snapshot, VM ID 'b4116e8a-...'.
-
- Veeam Software
- Posts: 3625
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Veeam VM Backup Job (of Exchange 2016) Crashes Hyper-V 2016 Host
Hello,
1. The errors above are related to object lookup process which is based on requests to WMI namespaces but I would say that this is a consequence and not the real root cause.
2. I strongly recommend to ask for escalation of the support case, looks like it requires detailed technical analysis.
3. I would try to define an exact operation which makes the issue to happen during backup: checkpoint creation or data read or cleanup operations at the end of the job etc.
4. According to this article, backing up just DAG passive node still provides full recovery options and should still properly truncate Exchange transaction logs, I believe it's worth testing as well.
By the way, you may take look at this topic to get more tips for DAG backup.
Thanks!
1. The errors above are related to object lookup process which is based on requests to WMI namespaces but I would say that this is a consequence and not the real root cause.
2. I strongly recommend to ask for escalation of the support case, looks like it requires detailed technical analysis.
3. I would try to define an exact operation which makes the issue to happen during backup: checkpoint creation or data read or cleanup operations at the end of the job etc.
4. According to this article, backing up just DAG passive node still provides full recovery options and should still properly truncate Exchange transaction logs, I believe it's worth testing as well.
By the way, you may take look at this topic to get more tips for DAG backup.
Thanks!
-
- Influencer
- Posts: 12
- Liked: 2 times
- Joined: Nov 09, 2017 6:15 am
- Full Name: wandersick
- Contact:
Re: Veeam VM Backup Job (of Exchange 2016) Crashes Hyper-V 2016 Host
Hi,
Thanks for the suggestions (liked). To answer your point #3, the point of failure is mainly during production checkpoint creation (every time), where Veeam status would spends a long time at checkpoint creation before it times out and fails.
We have some new discovery. The vitual machine in question seemed to run out of resources during Veeam application-aware backup, directly or indirectly caused and brought to a halt by Veeam's low-level interaction with the VM and Hyper-V host.
More specifically, we discovered the Windows event of source AFD, ID 16002 and type Warning from System log, that is produced only on the Exchange VM having the backup issue around the time of Veeam application-aware backup, while other virtual machines on the host and other Exchange DAG VMs on other identally configured Hyper-V 2016 hosts do not have this event. Therefore, at the moment we narrow it down to only the Exchange VM, rather than the Hyper-V host although it crashes.
Event 16002 (AFD) indicates there is port exhaustion on the system:
Any suggestions would be appreciated. Thanks!
Thanks for the suggestions (liked). To answer your point #3, the point of failure is mainly during production checkpoint creation (every time), where Veeam status would spends a long time at checkpoint creation before it times out and fails.
We have some new discovery. The vitual machine in question seemed to run out of resources during Veeam application-aware backup, directly or indirectly caused and brought to a halt by Veeam's low-level interaction with the VM and Hyper-V host.
More specifically, we discovered the Windows event of source AFD, ID 16002 and type Warning from System log, that is produced only on the Exchange VM having the backup issue around the time of Veeam application-aware backup, while other virtual machines on the host and other Exchange DAG VMs on other identally configured Hyper-V 2016 hosts do not have this event. Therefore, at the moment we narrow it down to only the Exchange VM, rather than the Hyper-V host although it crashes.
Event 16002 (AFD) indicates there is port exhaustion on the system:
We have verified the Windows Updates (latest), network driver versions, security product configuration, Exchange configuration and Windows configuration (both Hyper-V host and guest) basically match on the three Exchange DAG VMs. We know Veeam installs components to hypervisor and injects a process during application-aware backup into the virtual machine during backup. Therefore, it is also a possible culprit in my opinion.Closing a UDP socket with local port number 55265 in process 1556 is taking longer than expected. The local port number may not be available until the close operation is completed. This happens typically due to misbehaving network drivers. Ensure latest updates are installed for Windows and any third-party networking software including NIC drivers, firewalls, or other security products.
The local port number may not be available until the close operation is completed. This happens typically due to misbehaving network drivers. Ensure latest updates are installed for Windows and any third-party networking software including NIC drivers, firewalls, or other security products.
Any suggestions would be appreciated. Thanks!
-
- Veteran
- Posts: 528
- Liked: 144 times
- Joined: Aug 20, 2015 9:30 pm
- Contact:
Re: Veeam VM Backup Job (of Exchange 2016) Crashes Hyper-V 2016 Host
You can determine whether it's app-aware processing by trying to run the backup using Hyper-V Native Quiescence. VSS will still be used within the guest to make an application consistent backup, but it will be the Hyper-V host instructing the VM to take the snapshot, rather than Veeam. You lose the ability to truncate the Exchange database logs and such, but otherwise the backup is the same.
-
- Veeam Software
- Posts: 3625
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Veeam VM Backup Job (of Exchange 2016) Crashes Hyper-V 2016 Host
Hello,
I'm not sure about inability to truncate logs with native quiescence because Exchange VSS Writer is involved in backup process as well and should carry out post-backup tasks like log truncation.
Good to know that you've found the interesting pattern and I would be focused on sorting out the exact cause of this event, it would help to connect all dots in order to get the full picture.
May be it would make sense to disable antivirus software (if installed) on the VM temporary for testing purposes but anyway let's wait for results of analysis performed by our support team.
Thanks!
I'm not sure about inability to truncate logs with native quiescence because Exchange VSS Writer is involved in backup process as well and should carry out post-backup tasks like log truncation.
Good to know that you've found the interesting pattern and I would be focused on sorting out the exact cause of this event, it would help to connect all dots in order to get the full picture.
May be it would make sense to disable antivirus software (if installed) on the VM temporary for testing purposes but anyway let's wait for results of analysis performed by our support team.
Thanks!
Who is online
Users browsing this forum: No registered users and 25 guests