Increase in Hot Add backup time after installing Update 3a

Discussions specific to VMware vSphere hypervisor

Increase in Hot Add backup time after installing Update 3a

Veeam Logoby willjohnson » Wed Jul 11, 2018 2:51 pm

Hi,

Is anyone else experiencing a huge increase in backup or replication times since updating to U3a? (In addition to the SQL problem).

e.g. Backup job normally taking 3.5hrs now taking over 21hrs since updating.

After the update, logs now have additional lines of [ViProxyEnvironment], including"The proxy has NBD mode", while Veeam GUI still says HDDs are being backed up using [hotadd].

Logs also have a new [ViProxyEnvironment] entries of "The proxy cannot be used for write" and "The proxy has not SAN mode".

Before updating, logs had no mention of NBD mode etc. or [ViProxyEnvironment].

Case # 03096521

Cheers,

Will
willjohnson
Novice
 
Posts: 4
Liked: never
Joined: Wed Jul 11, 2018 8:02 am

Re: Increase in backup/replication times after U3a installat

Veeam Logoby foggy » Thu Jul 12, 2018 8:53 am

Hi Will, these messages seem to be vSphere 6.7 related. Please continue investigating with support engineer.
foggy
Veeam Software
 
Posts: 16299
Liked: 1302 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Increase in backup/replication times after U3a installat

Veeam Logoby willjohnson » Thu Jul 12, 2018 8:56 am

Hi foggy,

Messages weren't in logs before U3a update, regardless of vSphere 6.7.

Cheers.
willjohnson
Novice
 
Posts: 4
Liked: never
Joined: Wed Jul 11, 2018 8:02 am

Re: Increase in backup/replication times after U3a installat

Veeam Logoby foggy » Thu Jul 12, 2018 8:58 am

Yes, because vSphere 6.7 support was added in U3a update.
foggy
Veeam Software
 
Posts: 16299
Liked: 1302 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Increase in backup/replication times after U3a installat

Veeam Logoby willjohnson » Thu Jul 12, 2018 9:00 am

We're on 6.5.

Waiting for support to respond.

Cheers
willjohnson
Novice
 
Posts: 4
Liked: never
Joined: Wed Jul 11, 2018 8:02 am

Re: Increase in backup/replication times after U3a installat

Veeam Logoby Aron.Stocker » Mon Jul 16, 2018 5:05 am

Hi, we have the same issue (very longer backups and replication s).
We're interested too, please post any reply you receive from support.
Thanks
Aron.Stocker
Lurker
 
Posts: 1
Liked: never
Joined: Wed Jul 11, 2018 8:20 am
Full Name: Aron Stocker

Re: Increase in backup/replication times after U3a installat

Veeam Logoby foggy » Mon Jul 16, 2018 12:18 pm

I would recommend contacting support directly, since the reasons for that might be different and depend on a particular environment.
foggy
Veeam Software
 
Posts: 16299
Liked: 1302 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Increase in backup/replication times after U3a installat

Veeam Logoby Gostev » Mon Jul 16, 2018 1:07 pm

It looks like the issue is caused by hot add process taking too long (20-30 min), and it appears to be caused by some change around the latest VDDK update that Update 3a uses. Because for the original case, the workaround was to replace VDDK libraries with ones used in Update 3, which took hot add times down to normal (1-2 min).

We also already know that the issue does not affect every environment, so there's something else to it.

We will be opening a support case with VMware on this.
Gostev
Veeam Software
 
Posts: 22400
Liked: 2676 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Increase in backup/replication times after U3a installat

Veeam Logoby omegagx » Mon Jul 16, 2018 1:59 pm

OK, please keep us updated on this. We will hold off on installing Update 3a until this is resolved.
omegagx
Enthusiast
 
Posts: 40
Liked: never
Joined: Tue May 09, 2017 6:33 pm
Full Name: Michael Gorn

[MERGED] Issue with update 3a Hotadd and VMs with many disks

Veeam Logoby humbertoz » Mon Jul 16, 2018 11:46 pm

I know a lot of people have been fighting with the SQL issues that "appeared" with this update 3a. In reality, it is just how Veeam decided to change how the SQL backups run. The way they are doing it now is better, but everyone who got caught not having the exact "documentation" permissions in their SQL servers had their jobs break when they were running fine in the past.

Unfortunately, we ran into a big issue with one customer who has a couple of VMs with lots of drives (as in 8+ each). We have not observed this issue with any of our other customers, but everybody else has VMs with 4 drives max. When their jobs run, all lower disk count VMs process perfectly as they always have. Once one of the VMs with high disk counts starts to process, all currently processing VMs take from 30 minutes to two hours to connect each hotadd disk. Even disks that are already connected and transferring data pause when a disk hotadd is in it's limbo state. Basically, the entire job pauses during the disk hotadd timeout. Once the high disk VM has finally finished backing up, all remaining VMs in the job run as expected. This has caused their incremental jobs to go from 25 minutes to 2-4 hours. We have a ticket with Veeam and they have confirmed a bug with update 3a and high disk count VMs. They do not see the issue in debug against the VDDK, but when the software runs, obviously the issue is there. They have confirmed several other tickets coming in with the same symptoms. They are not seeing this issue with jobs configured with NBD (network mode). This customer runs on a very fast, three server cluster with 20Gb vSAN. It runs incredibly great and hotadd is perfect in their scenario. NBD is much slower for them since their VMs are on several different networks/DMZs and their firewalls are 1Gb. Their internet is 1Gb also, so they do not have a huge need to have 10Gb or 20Gb firewalls which would speed up their jobs if they were to switch to NBD. Hotadd allows the drives to be ripped at 20Gb regardless of network design/complexity which is beautiful.

Just a warning for everybody out there. Since you have to have VMs with lots of drives (we do not know the exact number, but at least more than four), luckily it will not affect a huge amount of people, but the ones it does it will be painful.

FYI, Veeam ticket # 03095693
humbertoz
Novice
 
Posts: 5
Liked: never
Joined: Mon Jul 16, 2018 10:45 pm
Full Name: Robert Zarate

Re: Increase in Hot Add backup time after installing Update

Veeam Logoby Gostev » Tue Jul 17, 2018 12:03 am

Thanks for the hint on VMs with large number of drives being the culprit. If this is confirmed and the issue is indeed with the latest VDDK version, a simple temporary hotfix could be for us to automatically force Network transport for VMs with number of disks larger than X. We will be trying to reproduce the issue internally now.
Gostev
Veeam Software
 
Posts: 22400
Liked: 2676 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Increase in backup/replication times after U3a installat

Veeam Logoby humbertoz » Tue Jul 17, 2018 12:39 am

My bad...I looked around everywhere for hotadd and high disk count posts, but never found this one since the title was worded a little bit different. Luckily, my post was merged to this one to help out everybody already here. As additional info, the customer that is affected by this was on vCenter and ESXi 6.5. After upgrading to 3a, they were hit by these issues. Since they were on 3a now, we were able to upgrade their vCenter and ESXi from 6.5 to 6.7. Unfortunately, their problems persist, so the issue is in the 6.7 VDDK from VMware that Veeam is using in 3a to be compatible with vCenter and ESXi 6.7. Since the customer was previously on vCenter and ESXi 6.5, it would seem that the VMware 6.7 VDDK also talks the same bad language to 6.5. I'm not sure if the root of this issue is in how Veeam interacts with the VDDK, or if it is how the VDDK interacts with vCenter and the hosts after getting all it's instructions from Veeam. If the VDDK just needs to be instructed differently by Veeam, then Veeam will be able to fix this with another patch. If the VDDK needs to interact with vCenter and the hosts differently, then the fix will have to come from VMware as a patch to the VDDK and then Veeam will have to integrate the newly patched VDDK into a patch 3b for all of us.

Veeam support, please correct any of this, or provide any further insight into how we will be able to resolve this issue. Unfortunately, failover for high disk count VMs to NBD would not be a solution for this customer, because then backups would "slow" down to 1Gb. The additional time to process the VMs versus the "broke" hotadd would probably be the same or more for their monthly full backups. For incrementals, it would take longer than before ("working" hotadd), but I'm positive it would still be faster than the current "broke" hotadd. Our support rep mentioned that this is basically the biggest issue they are working on right now coming out of 3a. They said it is in tier 3 and R&D, so hopefully it is getting a lot of traction over there.

Luckily, out of all our customers, only one has VMs with lots of drives and got hit by this bug.
humbertoz
Novice
 
Posts: 5
Liked: never
Joined: Mon Jul 16, 2018 10:45 pm
Full Name: Robert Zarate

Re: Increase in Hot Add backup time after installing Update

Veeam Logoby Gostev » Tue Jul 17, 2018 10:56 am

First tests are done and we could not confirm an issue using multiple VMs with 11 disks. To be continued...
Gostev
Veeam Software
 
Posts: 22400
Liked: 2676 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Increase in Hot Add backup time after installing Update

Veeam Logoby spiritie » Tue Jul 17, 2018 12:53 pm

We've also seen huge increase in backup time, and are not able to reach our backup window anymore.

Busy adding another proxy to see if it helps.

Running VMware 6.5, and all proxies are VM's using hotadd.

We though don't have many VM's with many disks, but have some large jobs.

EDIT:
Just did an inventory of our VM's in on of our vCenters. Out of around 400-500 VM's,
only 9 VM's has over 3 disks (Ranges between 4-7 disks)

Around 70-75% of the VM's has 1 disk, and 20-25% has 2 disks, rest is above.
spiritie
Service Provider
 
Posts: 31
Liked: 6 times
Joined: Tue Mar 01, 2016 10:16 am
Full Name: Gert van Niekerk

Re: Increase in Hot Add backup time after installing Update

Veeam Logoby humbertoz » Tue Jul 17, 2018 1:54 pm

Gostev wrote:First tests are done and we could not confirm an issue using multiple VMs with 11 disks. To be continued...

Gostev,

There has to be something to this. You can look at our ticket which will have logs uploaded. This customer has three backup jobs with 10, 13 and 26 VMs and two replication jobs with 9 and 10 VMs. There are only two VMs that go into the hotadd "pausing job" issue. One is a SQL server in job #1 with 10 VMs. The other is a backend Exchange server in job # 2 with 13 VMs. The third job with 26 VMs is not affected. Neither are the two replication jobs, but as I mentioned, neither the two replication jobs nor the third backup job have those two VMs in it (or any other VMs with high disk count). No matter what order we move those two VMs in their jobs (first, middle, last), the jobs tank as soon as they reach those two VMs. The issue has to be related to drive quantity. It could maybe be total drive sizes? Thos two VMs are their largest VMs. If you add up all 8/9 drives the SQL server is 2.2TB and the Exchange server is 3.5TB, but this is all spread out on 100GB, 200GB, 500GB drives for both VMs. They have three other VMs that are file servers with hundreds of thousands of files and they backup fine. They are 2.0TB and 1.7TB in size. Those three have only two drives, a small 60GB OS drive and then the 1.7TB to 2.0TB drive. I'll list things I can think might influence this.

#1 - VMs with high quantity of disks
#2 - All their VMs including the two affected ones use the VMware Paravirtual SCSI controller. Not sure if you tested this in conjunction with high quantity of disks.
#3 - VMs with large disks. Those two VMs are their biggest ones. Well over 2TB in total disks added up. There next larger VMs are exactly 2.0TB and down. Not sure if their is a majic threshold of 2TB for this issue.
#4 - Application aware processing VMs. All their VMs run with this on. The two affected VMs are a SQL and backend Exchange that have it enabled for obvious transaction log reasons. Again, not sure if you tested this in conjunction with high quantity of disks.
#5 - The current alignment of planets and moon causing this. I'm at a loss with this issue. No matter what we do to the backup jobs, when it hits those two VMs, they blow up.

Hopefully you can get some insight in your testing. If you need any specific changes tested on our end, just let me know and we'll try it. Every body just wants to get to the source of this evil.
humbertoz
Novice
 
Posts: 5
Liked: never
Joined: Mon Jul 16, 2018 10:45 pm
Full Name: Robert Zarate

Next

Return to VMware vSphere



Who is online

Users browsing this forum: cajunfeather, djhamp and 22 guests