Host-based backup of VMware vSphere VMs.
Post Reply
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Extremely slow backup of VSAN-resident VMs

Post by danswartz »

Case # 05302210

I opened this Thursday, but haven't heard back yet. Posting info here hoping someone has a clue. This literally changed overnight. Anyway:

I have a 3-host 6.7 cluster (7.0 VCSA) with 2 1-TB NVME cards participating in the VSAN datastore. Performance for guests is extremely good, reads and writes. Until a couple of days ago, I was getting 400+ MB/sec backup to my OmniOS ZFS NAS (8 1TB spinners in a raid-10). Suddenly, a couple of days ago, the performance went down by 10X. It seems to be related to backing up VMs on the VSAN datastore. I've been trying to isolate the cause. I have a CentOS8 guest with 60GB+. I'm using as a testcase. I've done the following:

Migrate the guest to a JBOD datastore on the same NAS. 400MB or so per second. Migrate back to VSAN. 20-40 MB/sec. Veeam B&R reports target is the bottleneck.

While the backup is crawling along, I logged into the Linux backup proxy (I have 3 but I hardcoded the test backup job to use one specific one.) I see this:

Code: Select all

iostat:

Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.90 0.00 8.55 0 85
sda2 0.00 0.00 0.00 0 0
sda3 0.90 0.00 8.55 0 85
sda1 0.00 0.00 0.00 0 0
sds 19.60 20070.40 0.00 200704 0

(sds is the hot-plugged guest drive)

192.168.3.44:/jbod/veeam 3.5T 890G 2.6T 26% /mnt/Veeam/{ade978d1-fae0-49b1-88ce-e883d964b241}
Is the NFS-mounted share from the JBOD.

I then tried writing a huge block of data to a file there:

Code: Select all

[root@veeam-proxy3 ~]# time dd if=/dev/zero bs=1M count=8K of=/mnt/Veeam/{ade978d1-fae0-49b1-88ce-e883d964b241}/FOO bs=1M count=4K conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.08942 s, 705 MB/s
No write bottleneck. Next I tried reading a huge block of data from the hot-plugged disk:

Code: Select all

[root@veeam-proxy3 ~]# time dd bs=1M count=8K of=/dev/null if=/dev/sda
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.22564 s, 1.4 GB/s
Even better! Finally, read from hot-plug and write to JBOD:

Code: Select all

[root@veeam-proxy3 ~]# time dd if=/dev/sds bs=1M count=8K of=/mnt/Veeam/{ade978d1-fae0-49b1-88ce-e883d964b241}/FOO bs=1M count=4K conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.76519 s, 635 MB/s
Decent speed, no? It's like veeam is throttling somehow (but I don't have any network throtting enabled).
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz »

Just repeating the exercise. B&R console currently showing 30MB/sec bottleneck=target. Repeated the dd exercise (this time hotplug disk is /dev/sdt):

Code: Select all

[root@veeam-proxy3 transport]# time dd if=/dev/sdt of=/mnt/Veeam/{b7580c9a-ff42-4e22-8839-85372af6fcb2}/FOO bs=1M count=4K
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 8.90813 s, 482 MB/s
Something very wrong here... I'm not aware of having changed anything config wise in veeam (and have looked high and low for anything proxy or repo-related but came up empty)
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz »

Very odd. It seems to be related to explicit full backup. Last night's daily ran. 323MB/sec. I just fired off a daily backup job full backup, and it's currently at 13MB/sec!
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by PetrM »

Hi Dan,

I'm going to ping our support team leaders so that to start the investigation ASAP. I believe that our engineers should examine debug logs and probably collect advanced performance statistics of Data Mover activity to understand whether the issue comes from hardware or not. So far, I don't suggest to fully rely on results reported by test tools as processes of data write performed by testing tools and performed by backup can be slightly different. By the way, you may also try FIO for testing your repository.

Thanks!
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz »

Okay, thanks. I'm very skeptical of the testing method being an issue. A slowdown of 20X is not slightly different, and as I said in the OP: I was getting perfectly good performance until a few days ago, then literally overnight, boom :( I understand dd is not a perfect tool, but given I can get 400+MB/sec reading from hotplug disk => NFS share, but the data mover is 1/20th of that...
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by PetrM »

I see our engineers have already started to analyze this problem, let's wait for what they can find out.

Thanks!
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz »

I've had a number of back and forth with the engineer, including running an fio test or two (which were fine.) One experiment I did:

svmotion the 64GB guest from vsan => jbod. several hundred MB/sec.
svmotion it back to vsan. still good performance (little under 200MB/sec).
use B&R migration forcing veeam transport, to remove vcenter from the equation, from vsan => jbod. it's currently at 26% (17GB) crawling at 9MB/sec. Bottleneck = source.

I've already told the engineer, I'm basically dead in the water, as the nightly incremental took more than 5 hours to run :( So for the time being, I'm unable to rely on B&R backups. If I can't get some kind of resolution here, I'm probably going to try reinstalling B&R, in the hopes that something is corrupted. Alternatively, svmotion all the guests to the JBOD, since backups from the jbod -> jbod seem to be ok.
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by PetrM »

Hi Dan,

The problem seems to be quite complex to troubleshoot, I think our engineers just need to have more time. Also, you can ask to escalate your case if you feel that more precise technical analysis is needed.

Thanks!
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz » 1 person likes this post

Thanks. The engineer did want to consult with some others. If nothing comes of that, I will certainly escalate.
danswartz
Veteran
Posts: 264
Liked: 30 times
Joined: Apr 26, 2013 4:53 pm
Full Name: Dan Swartzendruber
Contact:

Re: Extremely slow backup of VSAN-resident VMs

Post by danswartz » 1 person likes this post

So, it looks like at some point over the last week or so, microsoft pushed some update that borked my NFS performance. I couldn't easily switch to a Linux repository, since my NAS was running a solaris clone, which is not supported. I bit the bullet, and installed a RH8 clone on that HW, and lo and behold. 300-500MB/sec backups. Thank you Bill Gates :)
Post Reply

Who is online

Users browsing this forum: No registered users and 55 guests