Host-based backup of VMware vSphere VMs.
Post Reply
arya.destek
Novice
Posts: 6
Liked: never
Joined: Oct 11, 2022 11:50 pm
Full Name: ARYA IT Support
Contact:

Packet Loss

Post by arya.destek »

Hi,

Packet losses are common during Veeam backups. What could be the cause and solution? Ticket ID: #02657022

Veeam Version: 11.0.1.1261
Bottleneck: Load: Source 88% > Proxy 68% > Network 41% > Target 32%
Proxy transport mode: Automatic
Proxy concurrent tasks: 8
Proxy count: 1 physically bare metal server (10G network)
Backup repository concurrent tasks: 1
Job objects: linux servers (vmxnet3)

Thanks.
MarkBoothmaa
Veeam Legend
Posts: 198
Liked: 55 times
Joined: Mar 22, 2017 11:10 am
Full Name: Mark Boothman
Location: Darlington, United Kingdom
Contact:

Re: Packet Loss

Post by MarkBoothmaa »

I assume this is using VBR rather than agent and vmware?
When does the packet loss occur is it when the snapshots are taken and removed?

Is there a reason your repository only has 1 concurrent task?
arya.destek
Novice
Posts: 6
Liked: never
Joined: Oct 11, 2022 11:50 pm
Full Name: ARYA IT Support
Contact:

Re: Packet Loss

Post by arya.destek »

Hi,

Yes, Vmware and VBR products are used. Packet losses occur while taking and deleting snapshots.

Edit: Latency and packet losses increase when working with more concurrent tasks.

Thanks.
Regnor
VeeaMVP
Posts: 1007
Liked: 314 times
Joined: Jan 31, 2011 11:17 am
Full Name: Max
Contact:

Re: Packet Loss

Post by Regnor »

With packet loss, do you mean that some VMs aren't reachable during snapshot creation/consolidation?
Then it's probably either the load of the VM or a hardware performance limitation. At some point the ESXi host will need to stun the VM and depending on those factors, such a stun can take multiple seconds.
I would check the vmware.log of affected VMs and search for the stun time.

Also to rule out Veeam, try to manually create a snapshot, leave it active for the same time the backup job takes and then delete it.
arya.destek
Novice
Posts: 6
Liked: never
Joined: Oct 11, 2022 11:50 pm
Full Name: ARYA IT Support
Contact:

Re: Packet Loss

Post by arya.destek »

Sometimes the virtual server cannot be accessed during snapshot creation and deletion. When we take snapshots and control this situation manually, we rarely see that the latency time increases.
The servers is detected as down and services are affected because it does not receive a response during the load balancing health check.
When I examine the old vmware-0.log and vmware-1.log files, it seems that the stuns last between 0,3-4,5 seconds. (Checkpoint_Unstun: vm stopped for 4511435 us)
I could not access the vmware.log file. (can't open 'vmware.log': Device or resource busy)
What are the conditions affecting the stun duration, how can it be improved? Is it related to the ram and cpu in the virtual server, is it related to the I/O values of the physical disks, or what capacity are the physical servers insufficient?
Physical servers have 16Gx2 multipath san switchs connections, I don't think there is a bottleneck on the 3par storage side.
As I said, these negative situations are not always experienced, they are experienced very variable. For example, there are 3 virtual servers in 1 job, only 1 virtual server is negatively affected while this task is running. And the next day no problem.

Thanks.
jorgedlcruz
Veeam Software
Posts: 1494
Liked: 655 times
Joined: Jul 17, 2015 6:54 pm
Full Name: Jorge de la Cruz
Contact:

Re: Packet Loss

Post by jorgedlcruz »

Hello Arya,
I can see that you have a physical proxy and 3par as well, which is all great hardware. Are you using by any chance the Storage Integration? DirectSAN backup, at least? I can see that you let it automatic. But that would be what I will try first, enabling the Storage Snapshot integration to see if that helps reduce the stun. VMware Tools versions all good and up to date on all VMs?
Jorge de la Cruz
Senior Product Manager | Veeam ONE @ Veeam Software

@jorgedlcruz
https://www.jorgedelacruz.es / https://jorgedelacruz.uk
vExpert 2014-2024 / InfluxAce / Grafana Champion
arya.destek
Novice
Posts: 6
Liked: never
Joined: Oct 11, 2022 11:50 pm
Full Name: ARYA IT Support
Contact:

Re: Packet Loss

Post by arya.destek »

Hi jorgedlcruz,

You can think of the topology as below, 3par is not for storage backup. Backups are made to the disks on the backup server, repository connected with direct attach. (Avg. 1000-1200 MB/s)
https://freeimage.host/i/bJsuou (Link health: https://www.virustotal.com/gui/url/e020 ... ?nocache=1)

VMtools versions seem to be up to date. (2:11.3.0-2ubuntu0~ubuntu20.04.3)
Regnor
VeeaMVP
Posts: 1007
Liked: 314 times
Joined: Jan 31, 2011 11:17 am
Full Name: Max
Contact:

Re: Packet Loss

Post by Regnor »

Thr stun time is often affected by the load of the VM. If too much is happening there, at some point jt needs to be stopped so that the consolidation can occure.
How current is the version of your ESXi hosts? There have been improvements in newer releases.

What Jorge is referring to is the backup via storage snapshots. If you have an Enterprise Plus license or Veeam Universal license, you can use those and reduce the active snapshot time on the VM. If the stuns only occurs when the snapshot is growing, this could solve your problem.
https://helpcenter.veeam.com/docs/backu ... ml?ver=110
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Semrush [Bot] and 38 guests