Comprehensive data protection for all workloads
Post Reply
GreenEnvy
Enthusiast
Posts: 25
Liked: 5 times
Joined: Jul 31, 2012 3:45 am
Full Name: Lee
Contact:

Backup inconsistency and replication issue

Post by GreenEnvy »

Support Case # 01134679
Hi all,
We have two issues which may be related.
We have two sites that are each have at least 100Mbps fiber. This is just a WAN which we run a site to site VPN over, not a direct connection (sites are near Chicago and near Toronto). Both sides have vsphere essentials plus 6, with Veeam running on the same VM that vsphere is installed on. Both have vsphere 6.0 and Veeam 8 update 3. Backup jobs are reverse incremental with CBT. The VM running vsphere/veeam in each site has 2 virtual nic's. One is on the server VLAN, and one is on the iSCSI VLAN. The iSCSI vlan is not routable between offices (no gateway configured on the vcenter VM's for that nic). Both sites have a couple NAS/SAN's connected to the esxi hosts via iSCSI, on a VLAN with jumbo frames enabled (9000). This is where all the virtual hard drives reside.
Both sites have a nightly backup job which backs up the file server and a voicemail server. The virtual drives for the two VM's are all on the same datastore, which is an Equallogic PS4100.

So issue #1:
The Chicago site has some odd stuff occuring. The nightly job will run and the voicemail server, with a single 60GB hard drive, finishes in roughly 5-10 minutes with usually only about 2-3GB read. The file server has 3 virtual hard disks attached. All are LSA SCSI drives, not independant. One is 50GB, one is 1.5TB, one is 1.8TB.
The odd thing is some days it seems everything works as expected, each of these drives on the file server takes a few minutes to process and only has several hundred MB or a couple GB of data read. The whole job is done in 30 minutes.
The next day the same job runs, but now on the file server the drives look like they completely re-read the whole thing, so now instead of several hundred MB or a couple GB of data read, we have 22GB on the 50GB drive, 1.1TB of 1.5 on second, and 1.0TB of 1.8 on third. There are no different log messages from what I can tell.
The job correspondingly takes much longer, about 14 hours.
Not sure why this is the case, there certainly wasn't that much data written to the VM in between. Just normal office file shares. This seems to occur every so often, the backup runs fine with barely any data changing and then one day it processes way more and takes much longer.

Issue #2:
The Toronto site has a replication job that replicates those same two VM's from the Chicago site to the Toronto site. The voicemail server again works as expected and only takes about 5-10 minutes to complete. The file server however will get to the first virtual disk (50GB), and sit at 0% for many hours, usually over 12 hours. Once it starts actually transferring data, it seems to go quickly. Same repeats for the other two drives. Looking at the traffic graph on the job history, basically nothing happened from 8pm on Nov16th, until 11:30 am on Nov 18th. There is no reading or writing (short of a little blip on the 17th mid-afternoon).

Since both issues affect the same VM, I'm guessing the issue comes down to a configuration issue on the VM or how Veeam on the Chicago site has it registered, but I can't see any difference with how the VM or the jobs are setup between the offices. Also in the logs, Veeam is complaining it can't use hotadd mode, failing over to network. The voicemail server that is behaving fine does use hotadd without issue, just the file server again doesn't. There are also messages about the proxy not being on the management network of the esxi host. I'm not sure why it's generating that. The Chicago VM's IP is 10.3.0.50, the two ESXI host management IP's are 10.3.0.51 and 10.3.0.52. Same format in the Toronto office, just 10.4.x.x.

If anyone has seen something similar or has any suggestions, I'm all ears.

Thanks!
foggy
Veeam Software
Posts: 21070
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Backup inconsistency and replication issue

Post by foggy »

For the first issue, does the job just need to read more, however the amount of transferred data is still comparable to when it reads less? Looks like CBT issue, so look for the [CBT] tag next to the hard drive processing line in the job seesion log. If the amount of transferred data is also large, I'd check for some activity that might result in that large amount of changes inside guest OS. This thread might give you some hints of what to look for.

Regarding the second one, logs should tell what actually is happening during those idle periods, so please ask you support engineer to look into that.

Also, I'd check for hotadd limitations that might apply to your setup.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Semrush [Bot], slackhouse and 109 guests