Backup inconsistency and replication issue

GreenEnvy · Post by **GreenEnvy** » Nov 20, 2015 7:45 pm this post

Support Case # 01134679
Hi all,
We have two issues which may be related.
We have two sites that are each have at least 100Mbps fiber. This is just a WAN which we run a site to site VPN over, not a direct connection (sites are near Chicago and near Toronto). Both sides have vsphere essentials plus 6, with Veeam running on the same VM that vsphere is installed on. Both have vsphere 6.0 and Veeam 8 update 3. Backup jobs are reverse incremental with CBT. The VM running vsphere/veeam in each site has 2 virtual nic's. One is on the server VLAN, and one is on the iSCSI VLAN. The iSCSI vlan is not routable between offices (no gateway configured on the vcenter VM's for that nic). Both sites have a couple NAS/SAN's connected to the esxi hosts via iSCSI, on a VLAN with jumbo frames enabled (9000). This is where all the virtual hard drives reside.
Both sites have a nightly backup job which backs up the file server and a voicemail server. The virtual drives for the two VM's are all on the same datastore, which is an Equallogic PS4100.

So issue #1:
The Chicago site has some odd stuff occuring. The nightly job will run and the voicemail server, with a single 60GB hard drive, finishes in roughly 5-10 minutes with usually only about 2-3GB read. The file server has 3 virtual hard disks attached. All are LSA SCSI drives, not independant. One is 50GB, one is 1.5TB, one is 1.8TB.
The odd thing is some days it seems everything works as expected, each of these drives on the file server takes a few minutes to process and only has several hundred MB or a couple GB of data read. The whole job is done in 30 minutes.
The next day the same job runs, but now on the file server the drives look like they completely re-read the whole thing, so now instead of several hundred MB or a couple GB of data read, we have 22GB on the 50GB drive, 1.1TB of 1.5 on second, and 1.0TB of 1.8 on third. There are no different log messages from what I can tell.
The job correspondingly takes much longer, about 14 hours.
Not sure why this is the case, there certainly wasn't that much data written to the VM in between. Just normal office file shares. This seems to occur every so often, the backup runs fine with barely any data changing and then one day it processes way more and takes much longer.

Issue #2:
The Toronto site has a replication job that replicates those same two VM's from the Chicago site to the Toronto site. The voicemail server again works as expected and only takes about 5-10 minutes to complete. The file server however will get to the first virtual disk (50GB), and sit at 0% for many hours, usually over 12 hours. Once it starts actually transferring data, it seems to go quickly. Same repeats for the other two drives. Looking at the traffic graph on the job history, basically nothing happened from 8pm on Nov16th, until 11:30 am on Nov 18th. There is no reading or writing (short of a little blip on the 17th mid-afternoon).

Since both issues affect the same VM, I'm guessing the issue comes down to a configuration issue on the VM or how Veeam on the Chicago site has it registered, but I can't see any difference with how the VM or the jobs are setup between the offices. Also in the logs, Veeam is complaining it can't use hotadd mode, failing over to network. The voicemail server that is behaving fine does use hotadd without issue, just the file server again doesn't. There are also messages about the proxy not being on the management network of the esxi host. I'm not sure why it's generating that. The Chicago VM's IP is 10.3.0.50, the two ESXI host management IP's are 10.3.0.51 and 10.3.0.52. Same format in the Toronto office, just 10.4.x.x.

If anyone has seen something similar or has any suggestions, I'm all ears.

Thanks!

Post by **foggy** » Nov 23, 2015 3:22 pm this post

For the first issue, does the job just need to read more, however the amount of transferred data is still comparable to when it reads less? Looks like CBT issue, so look for the [CBT] tag next to the hard drive processing line in the job seesion log. If the amount of transferred data is also large, I'd check for some activity that might result in that large amount of changes inside guest OS. This thread might give you some hints of what to look for.

Regarding the second one, logs should tell what actually is happening during those idle periods, so please ask you support engineer to look into that.

Also, I'd check for hotadd limitations that might apply to your setup.

R&D Forums

Backup inconsistency and replication issue

Re: Backup inconsistency and replication issue

Who is online