Veeam v9 Backup Performance Slow

Availability for the Always-On Enterprise

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Mon May 23, 2016 2:40 pm

Gostev,

Naturally since one user isn't having an issue with v9, that means no one else could possibly be operating under different circumstances that cause an issue to surface. Only makes sense.

If you can't tell, that was sarcasm. I may be a little frustrated at support trying to tell me nothing is wrong while I'm watching my backups fail.

And for the record the bottleneck on the network is 1Gb, and since it has never happened before, and I have no indication of it ever happening on the source servers (10Gb), proxy (10Gb), or target (1Gb), in any log or other diagnostic information source, I very much doubt the issue is being caused by a sudden drop to 100Mb. Moreover, when it happens, 100Mb would be a great alternative to the kinds of speeds I'm seeing.

MSc, I am not using iSCSI on the Synology, but it is up to date. It was also most certainly responsible for a portion of my issues, but not, apparently, all of them.

Andreas Neufert, when the jobs are in this state they are status: running. They act as though they are working normally, except of course they aren't really passing any data and for all intents and purposes are failed.

I can't help but think that if Veeam were a little better about exposing what the heck it is actually doing at any given time these things might be easier to sort out without weeks of back-and-forth with support. As mentioned before, even with multiple logs of the problem support never caught that write latency on the target was ridiculously high (now solved), so I don't find it surprising that more subtle issues go unnoticed. Another example, I've never had anyone actually tell me what "Agent port is not recognized" actually means. I still run into that one pretty frequently.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby marcseitz » Mon May 23, 2016 2:59 pm 1 person likes this post

Hi,

we are running B&R9 since almost 2month now - And since the update we have performance-issues, too!
I'm still working with the support to figure out what's going wrong in our environment.(Case #01754045)

Some information about our environment:
Repositories: NetApp ~200TB, 6 physical Proxies Win2012R2 (2x6Core, 96GB Mem), B&R-Server Win2008R2, Daily VMs for Backup ~1600

What we've figured out:
- Since B&R9 the backup jobs are handled slower than before
- Example: The "Saving GuestMembers.xml" takes up to 15min (per VM!!)

You can copy out the Log from one VM (where the process is listet, Creating Snapshot, Releasing Guest.....) into notepad.
Then you will see the timestamps when that particiular task starts. So you can see if you have the same problem than we have.
The log will look like this:
Code: Select all
12.04.2016 09:19:53 :: Removing VM snapshot
12.04.2016 09:20:24 :: Saving GuestMembers.xml
[i]==> 09 minutes 16 seconds doing nothing???[/i]
12.04.2016 09:29:20 :: Finalizing
12.04.2016 09:29:30 :: Swap file blocks skipped: 125,0 MB
12.04.2016 09:29:31 :: Busy: Source 66% > Proxy 12% > Network 38% > Target 40%
12.04.2016 09:29:31 :: Primary bottleneck: Source
12.04.2016 09:29:31 :: Network traffic verification detected no corrupted blocks
12.04.2016 09:29:31 :: Processing finished at 12.04.2016 09:29:31


If I will have any news about the performance-issue I'll post this.

Regards,
Marc
marcseitz
Influencer
 
Posts: 17
Liked: 5 times
Joined: Wed Apr 04, 2012 11:17 am
Full Name: Marc Seitz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Mon May 23, 2016 5:45 pm

I do not appear to have the same issue as you. All of my 'Veeam is doing nothing at all' time is happening during the actual disk data transfer, so the console reports nothing.

Code: Select all
5/23/2016 8:09:30 AM :: Queued for processing at 5/23/2016 8:09:30 AM
5/23/2016 8:09:30 AM :: Required backup infrastructure resources have been assigned
5/23/2016 8:09:35 AM :: VM processing started at 5/23/2016 8:09:35 AM
5/23/2016 8:09:35 AM :: VM size: 1.1 TB (962.0 GB used)
5/23/2016 8:09:45 AM :: Getting VM info from vSphere
5/23/2016 8:09:57 AM :: Using guest interaction proxy veeam.clearybuilding.us (Same subnet)
5/23/2016 8:10:12 AM :: Inventorying guest system
5/23/2016 8:10:13 AM :: Preparing guest for hot backup
5/23/2016 8:10:16 AM :: Creating snapshot
5/23/2016 8:10:30 AM :: Releasing guest
5/23/2016 8:10:30 AM :: Getting list of guest file system local users
5/23/2016 8:10:52 AM :: Saving [vSphere-VMs] ClearyShares/ClearyShares.vmx
5/23/2016 8:10:55 AM :: Saving [vSphere-VMs] ClearyShares/ClearyShares.vmxf
5/23/2016 8:10:57 AM :: Saving [vSphere-VMs] ClearyShares/ClearyShares.nvram
5/23/2016 8:11:00 AM :: Using backup proxy VeeamVeronaProxy for disk Hard disk 1 [hotadd]
5/23/2016 8:11:39 AM :: Hard disk 1 (100.0 GB) 18.3 GB read at 57 MB/s [CBT]
5/23/2016 8:17:16 AM :: Using backup proxy VeeamVeronaProxy for disk Hard disk 2 [hotadd]
5/23/2016 8:17:49 AM :: Hard disk 2 (1.0 TB) 806.9 GB read at 54 MB/s [CBT]


I'm watching it happen right now without the same latency issues we had on the target before. That last line is a lie, see, it started around 60-70MB/s, it has slowed to jumping between 4-5MB/s and 15-25MB/s in the last hour or so and switched from the primary bottleneck being "Source", as it has been for the past week and as it usually is, to being "Network", which is bull. I still can't rule out the target though.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby tsightler » Mon May 23, 2016 6:06 pm 2 people like this post

Can you please check the memory utilization of your Ubuntu container during the periods of slow performance? Looking at the logs it appears that this container is assigned only 2GB of RAM, which is well below the recommended minimum of 4GB per active job. Your graph is indicative of a repository that has run out of memory to store the deduplicaiton hash and this would potentially explain why "network" is showing as the bottleneck as "network" indicates difficult with Veeam attempting to transfer data from the source data move (the proxy) to the target data mover (the VeeamAgent running in the Docker instance on the Synology). I would love to see some data on what free memory looks like both when the job is running well, and when it collapses. Also, do you happen to be uses WAN or LAN storage optimization rather than "local"?
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Mon May 23, 2016 8:03 pm

Storage optimization is "Local target" for all backup jobs, and Compression level is "Optimal".

The device only has 2GB of memory. It is a limitation of the platform. Your explanation makes sense, however, since we haven't had this issue in the past it would suggest that some part of the process has changed and now requires more ram than previously. To be fair, I cannot be certain if Veeam or DSM 6 is to blame there.

I will test by configuring an older server with significantly higher ram as the storage repository, using NFS to connect to the array. That should tell us if this is part of our issue or not.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby CCastellanos » Tue May 24, 2016 12:15 am

Tanner, perhaps more as post-mortem it might be worth checking your job settings based on these observations:
- You mentioned you had daily incremental, I may have missed what type of incremental, but in an scenario of REVERSE INCremental to a RAID 6 Array with 5400RMP 8TB drives your performance might be very limited from the get-go.
- Your bottleneck might not be the Appliance, controller or Network pipe event at 1Gbps but your spinning disks speeds and the dual parity overhead. In a REVERSE INC the read/writes can pound any system, more so depending on your change rate and size of backup file. This is perhaps the closest source from my view to describe the kind of wait times you are seeing in the job: the storage is simply busy.
- Something else is if your appliance does any sort of caching, that would help.
- Your PROXY seems to be more than capable to process whatever could be end up sent to REPO. But I did miss if your B&R/Proxy VM was also siting on the same Docker container as the REPO VM?
- When you move to v9 did you convert the job to per-VM backup file or kept as it was? This may change the load pattern to your REPO.
- Not sure if I caught this was very OK on v8?
CCastellanos
Influencer
 
Posts: 11
Liked: never
Joined: Wed Sep 05, 2012 8:44 pm
Location: Astoria, NY
Full Name: Carlos Castellanos

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Tue May 24, 2016 2:33 pm

-Daily incrementals are regular old forward incrementals from a weekly active full. I am aware that 5400RPM 8TB drives are not particularly performant. This is irrelevant since they had worked well enough before.
-See above
-Aside from the write caching on the disk, I would be very surprised if the array did any significant caching on its 2GB of ram. Also, this was working well enough before regardless.
-B&R is a VM on a vSphere cluster, the Repo WAS a docker container on a Synology array. I've taken the advice of support and installed a new repo, a 2.4Ghz 8 core 48GB Ubuntu server that accesses the array via NFS. See the included image for how that's working out.
-Nothing about the jobs was changed. Especially not the things I can't change since they're only available to enterprise customers.
-Yes, as I have said several times, everything worked well enough in v8.

As mentioned above, I installed a new linux repository and switched the backups over to use it. It is backed by the same storage, only now it is accessed through NFS by a 2.4Ghz 8 core 48GB ram Ubuntu server. This server does nothing but run as the Veeam repo. So, since talking to support, I have added a proxy (4 core 3.4ghz 16GB ram), and this thing, in addition to the original B&R server, basically more than tripling the compute resources of the Veeam infrastructure. Sadly, this has not at all had the intended effect:

Image

As you can see, the job started off well enough. It had some weird spike/trough pattern to the transfer, but it averaged out to 60+MB/s so I was ok with it. I even started a second job that seemed to be running ok to. Then I went home. Around midnight one of the jobs simply stopped transferring data. Around 1:30AM, so did the other one. Even a replication job stopped working. The storage device registers no activity, these jobs are simply hung. Also notice that Veeam is blaming the source this time, that's a new twist.

I'll be adding these logs and info to the ticket.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby tsightler » Tue May 24, 2016 3:16 pm

Was this a new active full or an incremental? I was looking at the logs on one of your larger servers that appeared to hang at ~1:30AM and everything is performing nicely on the source and target, but then I see this in the logs on the new Linux repository:

Code: Select all
[24.05.2016 01:37:05] <139794424174336> stg| WARN|FIB update has been going on more than '5' minutes, recorder '0x000x7f2488141700', FIB 'Backup of the FIB Exchange1_1-flat.vmdk'.
[24.05.2016 01:37:05] <139794608813824> alg| WARN|Timed out to wait for block, block index '178959' (Wait loop will be continued, timeout '1440' minutes ).
[24.05.2016 01:37:05] <139794466137856> alg| WARN|Timed out to wait for block, block index '178960' (Wait loop will be continued, timeout '1440' minutes ).
[24.05.2016 01:37:05] <139794382210816> alg| WARN|Timed out to wait for block, block index '178962' (Wait loop will be continued, timeout '1440' minutes ).
[24.05.2016 01:37:05] <139794055124736> alg| WARN|Timed out to wait for block, block index '178963' (Wait loop will be continued, timeout '1440' minutes ).
[24.05.2016 01:37:05] <139794046732032> alg| WARN|Timed out to wait for block, block index '178964' (Wait loop will be continued, timeout '1440' minutes ).
[24.05.2016 01:37:05] <139794440959744> alg| WARN|Timed out to wait for block, block index '178961' (Wait loop will be continued, timeout '1440' minutes ).

FIB update is a simple update operations to XML summary data stored within the backup file, it's hard for me to read this as anything other than a disk I/O issue. I'm trying to think of something else that would cause this and I'm still looking at the logs (and support may have a different opinion). Do you happen to have any other storage to which you could try running a backup to as a test? The logs will probably tell me, but was this a clean full backup on the new repo or did you map the existing backup chain to the new repo?
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Tue May 24, 2016 3:23 pm

Both jobs were active fulls, as both jobs had not had a chance to do that this weekend due to the issues. I did make sure to map them appropriately when I set up the new repo.

However, I don't believe this is relevant to the issue, as it seems that it was caused by an NFS failure. More specifically, the NFS module of the array crashed. Hopefully that was due to something I can control for and I can correct it without resorting to CIFS, but it seems that it wasn't a Veeam issue this time.

Well, unless you count the misreporting of the bottleneck. Why would it be source? Also, I can't yet determine why the replication job failed. I'd changed the storage it was set to use for metadata to something other than this repo.

Anyway, testing continues on the new repo.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby tsightler » Tue May 24, 2016 3:49 pm 1 person likes this post

Ah, that makes sense, hopefully you can get to the root cause of the NFS failures. I saw that the other job failed with the identical error.

It's not at all uncommon for source to be the bottleneck for a full backup, in most cases I would expect it to be so. Bottleneck is just a representation of which point in the chain Veeam is spending the most time waiting on measure at 4 points, source disk read, proxy processing (mostly compression), transfer of data from the proxy to the repository (network), and target disk write speed. Since the data written to the target is compressed, it has half as much data to deal with, the network isn't the bottleneck unless your saturating it, and the proxy isn't like to be the bottleneck unless it's using 100% of the CPU, so source, which is the device transferring the most data out of all of that, is almost certainly going to be the bottleneck.

I'll look at the replication log.

BTW, I tried to look at your initial logs that you uploaded but they didn't include enough information. The logs from the Docker repository cut off before it got to the error, I'm not sure why but I wondered if it had to do with the fact that the storage device seemed to be keeping time in a different timezone. It didn't even look like UTC because it was off by too many hours.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Tue May 24, 2016 5:10 pm

Your explanation is how I would expect it to work, but in practice it never seems to be correct.

For instance here it was quite obviously the Target that was responsible, as all data was getting to the repository, it just was never getting written to disk. Instead it reports Source. In the original issue, it was reporting Network for a similar issue (write latency at the target). Currently it reports Target, which I can believe.

I had to abandon the new Linux repository and go to CIFS. There were too many issues with NFS. I'm willing to believe that is a result of the synology implementation for now. Historically I've been very reluctant to use CIFS with this setup as when we originally set it up CIFS drastically underperformed. So far it seems tolerable. If it can chug along without dying horribly the moment a second job starts, or trailing off into fairy land with the throughput, it might be the end of my difficulties.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby tsightler » Tue May 24, 2016 5:16 pm 1 person likes this post

cbc-tgschultz wrote:Your explanation is how I would expect it to work, but in practice it never seems to be correct.

Bottleneck statistic take into account stats during normal operation, it's not going to update once the repository stopped working because data was no longer being transferred. That's not a bottleneck, that's a failure.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Tue May 24, 2016 6:33 pm

If that's the case, then why doesn't the job fail instead of sitting there indefinitely not transferring data?
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

Re: Veeam v9 Backup Performance Slow

Veeam Logoby tsightler » Tue May 24, 2016 7:03 pm 1 person likes this post

cbc-tgschultz wrote:If that's the case, then why doesn't the job fail instead of sitting there indefinitely not transferring data?

I'm sure the job would eventually fail. You can see in the logs the agents continuing to retry every 30 minutes, but I'm not sure how many times it would retry before finally just giving up. Regardless, when no data is transferring, bottleneck statistics are not being updated.

In your earlier issue, when you were seeing the slow performance going to the Docker repo, bottleneck statistics were still being updated because data transfer was still happening, it was just very slow, which is why I was expecting the possibility of memory starvation.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam v9 Backup Performance Slow

Veeam Logoby cbc-tgschultz » Wed May 25, 2016 3:00 pm

Which, after a night of testing, does seem to have been the case. Things ran exactly as expected with the repository configured for CIFS instead of Linux (either via Docker or external server). Considering that it ran fine with the Docker container prior to v9, I expect changes were made that caused it to behave differently than it did before with the limited RAM amount.

Unfortunately for me, the array continues to have issues with disk failure, causing a long running job to fail last night due to "Shared memory connection was closed" at the same time a redundant disk failed. At least I hope that's what caused that error. I have no idea why such a thing should cause a problem with the CIFS connection, but I'm willing to blame Synology for that one.

It does make me wish Backup jobs had the ability to resume where they left off so I wouldn't lose 8 hours of transfer though.

Anyway, this thing is going to be down at least a week while we get new disks and repair the now zero-redundancy RAID array, so I won't be able to confirm this is a long term solution for some time, but I'm optimistic given the results so far.
cbc-tgschultz
Enthusiast
 
Posts: 46
Liked: 9 times
Joined: Fri May 13, 2016 1:48 pm
Full Name: Tanner Schultz

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Google [Bot] and 23 guests