Host-based backup of VMware vSphere VMs.
Post Reply
netwichi
Influencer
Posts: 10
Liked: never
Joined: Oct 15, 2024 9:15 am
Contact:

[Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by netwichi »

Hi,
we are now encountering an issue just before launching our backup service using Veeam.
we already implemented one of proposed solutions described in KB1618 in order to prevent VM stun for NFSv3 datastore environment.
However, during our QA test before launch, we found that in some cases VM stun may happen.
Cloud you check below and provide us workaround of fix for this problem?

[Enviroment]
* Each ESXi host have own hot-add proxy.
* Reg key EnableSameHostHotaddMode=2 has been applied.
* DRS enabled.(vMotion/vStorage vMotion may happen anytime.)
* VMs are multi-homed on NFSv3 and iSCSI datastores. Direct NFS is not the best option here.

[Issue]
* After the backup job starts, vMotion operation is accepted and the source VM is migrated to the other ESXi hosts than the original ESXi host where the selected hot-add proxy resides.
From the forum thread below, the lock of vMotion operation starts after VDDK usage. It seems there might be a timeslot when the vMotion operations can be accepted.
vmware-vsphere-f24/i-thought-vbr-blocke ... 82158.html
* Processing the source VM from the proxy may cause stun.
Acorrding to the VMware KB below, this may cause not only the stun of source VM but also irresponsibility of the ESXi hosts.
https://knowledge.broadcom.com/external ... orage.html

[Others]
* We opened Case#07484561 but it seems relating to expected product behavior so posted here.

Regards,
Andreas Neufert
VP, Product Management
Posts: 7098
Liked: 1517 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by Andreas Neufert »

Thanks for reaching out here.
The listed behavior is based on the VMware limitations and their technology. The longer stun can happen when your storage system latency is high at IO burst operations (Snapshot Removal) and specifically if a VM HotAdd disk is mounted to another host as you mentioned already. HCI systems with spanned storage across the ESXi nodes are today usually the only ones that are affected, I did not heard about regular NFS storages that had this issue that dramatically.

In general, I would avoid HotAdd processing in your case and fallback to Direct Storage Mode (DirectNFS & Direct SAN iSCSI can be used with virtual proxies as well even in mixed disk environments). We created DirectNFS for that specific reason to avoid these stuns.

Regarding the listed issues:
Regarding the DRS kicking in between the time the Proxy is selected and the backup data transport start. This is very strange case and I will check on the sequence of processing. How often did you run into this issue?

The other one with the VMware KB it is what you have addressed already with the settings you described. As VMware stated this issue is addressed in vSphere 8.0 Update 2b it makes sense to update to this VMware version or to switch to Direct Storage processing mode (or NBD backups).
netwichi
Influencer
Posts: 10
Liked: never
Joined: Oct 15, 2024 9:15 am
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by netwichi »

Thank you for your reply.

This issue is occurring in production environment during QA testing before service launch.
So actual frequency is not observed yet.

We expected the reg key resolves stun issue as the KB1618 provide 3 options for avoiding stun and this is one of them.
Could you confirm whether this behavior is expected and unavoidable?

Regards,
Andreas Neufert
VP, Product Management
Posts: 7098
Liked: 1517 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by Andreas Neufert »

The issue with stuns can happen under the conditions that you listed (NFS storage, HotAdd backup with mounting cross hosts). This is a VMware specific issue that you/we need to workaround. See https://knowledge.broadcom.com/external ... orage.html

Do the following:
1) Update to vSphere 8.0 u2b where VMware shared that this issue is solved. (See your VMware KB https://knowledge.broadcom.com/external ... orage.html)
2) Use Direct NFS for backup, as we have developed this backup mode exactly to address this issue. Tons of innovation in here. As well it can be used in combination with DirectSAN if you have multiple different VM disks
3) If you really want to do HotAdd processing, use the https://www.veeam.com/kb1681 mentioned reg keys to force Veeam to use a Proxy on the same host. Install proxies on each host.
4) Fallback to Network Transport processing if all the above can not be used by you.

In general:
The magnitude of the VMware issue really depends on disk latency at IO bursts and the better you design the NFS backend the less impact you have.
netwichi
Influencer
Posts: 10
Liked: never
Joined: Oct 15, 2024 9:15 am
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by netwichi »

Thank you for your support, Andreas.
Actually, we implemented (3) already but it seems vMotions are not blocked and stun may occur.
Is there any timeslot in the backup sequence when vMotions can be accepted?
Should we wait for support’s reply in the Case#07484561?

Regards,
netwichi
Influencer
Posts: 10
Liked: never
Joined: Oct 15, 2024 9:15 am
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by netwichi »

Hi Andreas,
In the case, we are now being advised to check with VMware about the blocking specification during backups and the investigation is about to be stopped in Veeam support side.
Could you help and follow up on this case as we think the behavior depends on how Veeam implements VDDK/lock of vMotion in the backup sequence?

Regards,
Andreas Neufert
VP, Product Management
Posts: 7098
Liked: 1517 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by Andreas Neufert »

It should not be possible. You can check within the logs that we issue the method call to vcenter to prevent vmotion.
If the call was denied or could not be execututed then we just ignore and continue with backup.

You can try it yourself. Create a test VM. Start backup. Then do a vmotion while backup is running. If vmotion goes through (backup fails), then we do not have the permissions for global.method on the vcenter. Important to mention is that global.method needs to be given on whole vcenter object, not only on a cluster level or so. Otherwise it would not work based on VMware limitations.
netwichi
Influencer
Posts: 10
Liked: never
Joined: Oct 15, 2024 9:15 am
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by netwichi »

Hi Andreas,

Thank you for your reply
We have checked permission(global methods) assigned. Also, we reproduced this behavior with administrator@vsphere.local as well.

From the support response, VixDiskLib_PrepareForAccess() appears to fail for VMs that are during vMotion.
Since there are several seconds between the time job starts and the time VixDiskLib_PrepareForAccess() issued, it seems the current behavior accepts vMotions.
In such case, is the proxy selection for the VM being vMotioned based on the current ESXi host, not the vMotion destination ESXi host?
Is there a way to get around this?

Regards,
Andreas Neufert
VP, Product Management
Posts: 7098
Liked: 1517 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: [Fix/Workaround Needed] Hot-add proxy processes VMs on NFSv3 datastores on different hosts even with KB1681 applied

Post by Andreas Neufert »

Checking with internal resources. Andy S. will reach out soon here.
Post Reply

Who is online

Users browsing this forum: No registered users and 27 guests