All flash array, how can bottleneck be Source?

Availability for the Always-On Enterprise

All flash array, how can bottleneck be Source?

Veeam Logoby HendersonD » Wed Aug 31, 2016 9:10 pm

We have two new Nimble arrays. An all flash array in production and one of their hybrid arrays in our DR site
The two sites are across campus from one another and connected with 10 gig fiber
The all flash array is the source for backups, the array in DR is the target
Single physical proxy server using DirectSAN for both connections with 10 gig connections to both sites. This proxy has mounted on it an E: drive from the DR array which is where the backups are stored
We are doing forever forward incremental
My backup for the past few days runs at 146MB/s. I click on each VM backup to look at the details and it says the Source was the pinch point, most of the time Source is shown at 99%

How can the Source be the bottleneck when the read speed and throughput on an all flash array is huge?
Is there some mis-configuration on my Nimble Array or within Veeam?
HendersonD
Enthusiast
 
Posts: 57
Liked: 3 times
Joined: Sat Jul 23, 2011 12:35 am

Re: All flash array, how can bottleneck be Source?

Veeam Logoby foggy » Thu Sep 01, 2016 4:48 pm

HendersonD wrote:My backup for the past few days runs at 146MB/s.

Did it run faster previously? Basically, if source is a bottleneck, it means that data cannot be retrieved from the storage any faster. I'd check the firmware and make sure you're using latest storage drivers. Probably other community members can chime in to report their performance values.

Feel free to open support case in order to let our team take a closer look at your environment.
foggy
Veeam Software
 
Posts: 14752
Liked: 1083 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: All flash array, how can bottleneck be Source?

Veeam Logoby tsightler » Thu Sep 01, 2016 5:34 pm

The storage can easily still be the bottleneck because, based on your setup. Veeam measures bottlenecks at 4 points, source, proxy, network, and target. In you're setup there's no network traffic (both proxy and source are on the same server so it uses shared memory, which is still way faster than flash), and the target is likely writing 50% less data than is being read from the source since the data is compressed, so that's unlikely to be the bottleneck. Proxy, is the measure of CPU time spent on the proxy, which is unlikely to be very high if your total throughput is only 146MB/s.

You still have to read the uncompressed data from the array over whatever type of interconnect you are using to read data from the array, and it's that interconnect that's the most like candidate and I would agree that 146MB/s seems somewhat slow for an all flash array.

Unfortunately, I don't have enough information to really make even an educated guess, so for now I'll just ask a lot of questions:

What type of initiators are you using?
Do you have specific links dedicated for both ingest (reads from the source array), vs egress (writes to the target array)?
Is your proxy tuned to minimize response time by disabling things like interrupt mitigation on the HBA/network adatpers?
Are you running many VMs in parallel and is 146MB/s the aggregate speed?
Are full backups just as slow (incremental backups are notoriously difficult to judge speed because sometimes they read so little data it's hard to calculate a true throughput?
How much total change was in the job that reporte 146MB/s, i.e. read vs transferred?
tsightler
Veeam Software
 
Posts: 4772
Liked: 1740 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: All flash array, how can bottleneck be Source?

Veeam Logoby HendersonD » Thu Sep 01, 2016 6:06 pm

The two Nimble arrays are new 3 weeks ago and we got Veeam running for the first time 2 weeks ago so we have no historical data to compare against

On the Proxy box we have Nimble Connection Manager installed which is a nice front end to Microsoft iSCSI. So on the proxy box we are using microsoft's iSCSI initiators
On the ESXi side we are using their software initiators
We do not have specific links dedicated to reading/writing. The physical proxy server has two 10gig ethernet connections in my storage vlan which of course is not routable which is best practice. Not sure how Veeam decides which interface to use at any one point in time
Not familiar with what types of tuning I should use on the proxy server. If there is some type of KB you can point me to that would be great
146MB/s is the aggregate speed. There are 15VMs being backed up. Since we are using forever forward incremental, change block tracking kicks in. Last night's run:
Processed: 5.3TB
Read: 237GB
Transferred: 77.5GB
Our one full backup we ran two weeks ago when we first installed Veeam screamed. So yes, our incrementals are much slower but you are correct it has to slog through a lot of data to eventually just write what has changed. Perhaps I do not have a clear understanding of what 99% Source means in relation to an incremental backup
HendersonD
Enthusiast
 
Posts: 57
Liked: 3 times
Joined: Sat Jul 23, 2011 12:35 am

Re: All flash array, how can bottleneck be Source?

Veeam Logoby tsightler » Thu Sep 01, 2016 6:54 pm

Thanks for the data, sorry for a couple more questions, how long did the total backup (the one you provided stats for) actually take? Oh, and are all these VMs only on a single datastore? I'm thinking you don't really have an infrastructure problem if your full backups screamed.
tsightler
Veeam Software
 
Posts: 4772
Liked: 1740 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: All flash array, how can bottleneck be Source?

Veeam Logoby HendersonD » Thu Sep 01, 2016 8:43 pm

The full backup ran about 350MB/s so more than twice the speed of the incrementals. The backup from last night took 43 minutes to complete. These VMs live on three datastores. In ESXi I have the three datastores in a single datastore cluster. I do know that Veeam spends a lot of time just reading through all this data trying to parse out what has changed and that takes time. You are correct that the other 3 possible bottlenecks may pale in comparison to reading all of this data even though it is read fast there is a lot of it. Perhaps what I am seeing is normal behaviour
HendersonD
Enthusiast
 
Posts: 57
Liked: 3 times
Joined: Sat Jul 23, 2011 12:35 am

Re: All flash array, how can bottleneck be Source?

Veeam Logoby tsightler » Thu Sep 01, 2016 9:15 pm 1 person likes this post

Something will always be listed as the bottleneck, that something is the part of the chain that we spent the most time waiting for, but that doesn't mean there is a problem. In your case your target storage only had to write 77GB of data, or 3x less than what we had to read, because that was what was saved by data reduction (compression/dedupe), and as mentioned above, unless you're using 100% of the CPU on the proxy, or 100% of the network bandwidth, those are not going to be listed as the bottleneck.

Also, the read of incremental data is random, rather than sequential, which has significant less impact on a flash device than spinning disks, but still has some impact. I think you're seeing pretty decent speed overall. I think only way you could get faster is by increasing the number of parallel tasks, I'm assuming you're seeing at least some queuing when VMs are waiting for resources.
tsightler
Veeam Software
 
Posts: 4772
Liked: 1740 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: All flash array, how can bottleneck be Source?

Veeam Logoby HendersonD » Thu Sep 01, 2016 9:28 pm

I just started my backup job to take a look
I have 4 VMs processing right now with all the rest in a pending state with this message
9/1/2016 5:22:49 PM :: Resource not ready: Backup repository

My proxy box is fairly beefy so it is set to have 12 concurrent tasks
When it says that "Resource not ready: Backup repository" what exactly is happening? What exactly is it seeing in my backup repository that has it only processing 4 or 5 VMs at a time during backup?

You are correct, if there was some way to have it process more VMs at once, it would be great
HendersonD
Enthusiast
 
Posts: 57
Liked: 3 times
Joined: Sat Jul 23, 2011 12:35 am

Re: All flash array, how can bottleneck be Source?

Veeam Logoby Gostev » Thu Sep 01, 2016 9:49 pm

It means that backup repository has no more task slots available due to processing other VMs (max concurrent tasks value reached).
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: All flash array, how can bottleneck be Source?

Veeam Logoby tsightler » Thu Sep 01, 2016 9:53 pm

Sounds like you have your repository limited to 4 tasks, which is the default, designed to keep from overloaded the repository. Since your repository is quite fast, I'm sure you can increase this limit, at least if your proxy/repository has enough cores and memory. The tasks limit can be changed on the "Repository" tab of the repository properties window, and is listed under "Load control". I suspect if you go to 8 or 12 tasks you'll significantly increase the overall performance.
tsightler
Veeam Software
 
Posts: 4772
Liked: 1740 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: All flash array, how can bottleneck be Source?

Veeam Logoby HendersonD » Fri Sep 02, 2016 1:34 pm

Jacking up the backup repository limit from 4 to 8 helped. I am not getting 166 MB/s and my backup went from 45 minutes to 39 minutes. The bottleneck analysis is interesting on the entire job
9/2/2016 8:52:31 AM :: Load: Source 94% > Proxy 23% > Network 26% > Target 0%

I think the source at 94% is a bit of a red herring. Yes, it is probably the bottleneck in relation to Proxy, Network, and Target but I am guessing I am getting as much out of my source as possible. I have two good size file servers and I have indexing turned on for this backup job. In looking at these two file servers, one takes 6 minutes to index and the other one a whopping 27 min. The one that takes 27 min actually has less total data on it but a lot of file/folders to work through. It is the files server that holds the home directories for the 4,300 students on my campus. Since we are K-12 there are probably 2,000 of these home directories (upper class students) that contain a lot of files/folders.

What controls the speed of indexing and is there any way to speed it up? I am guessing it is this indexing that is showing the source as the bottleneck
HendersonD
Enthusiast
 
Posts: 57
Liked: 3 times
Joined: Sat Jul 23, 2011 12:35 am

Re: All flash array, how can bottleneck be Source?

Veeam Logoby Gostev » Sun Sep 04, 2016 7:50 pm

No, indexing is not a part of bottleneck numbers as they are reported by data movers.
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: All flash array, how can bottleneck be Source?

Veeam Logoby Delo123 » Mon Sep 05, 2016 7:39 am

Processed: 5.3TB
Read: 237GB
Transferred: 77.5GB


Your backup job is hardly reading any data, not sure how skipping data due to CBT is being measured but this is what we also see on our all flash arrays. Full Backups are done between 1GB/s and 2GB/s but incrementals of static data only run at about 200-300MB/s...
Delo123
Expert
 
Posts: 351
Liked: 97 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: All flash array, how can bottleneck be Source?

Veeam Logoby robbys » Mon Sep 05, 2016 9:00 am

I have also an all flash storage solution, but with HPE VSA.
If he fails back to normal backup I get 70MB/s, If I do storage snapshots on VSA, I get around 400-550MB/s. With the storage snapshots, I'm running into the limitation of my backup server (that doubles as proxy too), then the proxy or the network are the bottleneck. With the normal backup, the source is always the bottleneck.

So it depends heavely on how you contact the source and with a speed of 200-300MB/s you can not saturate the backup server itself if it is a new one, it is still hold back at the source (not the disk, but the way you access the disk and how it impacts the rest of the server).
robbys
Novice
 
Posts: 5
Liked: never
Joined: Mon Nov 17, 2014 11:01 am
Full Name: Robby Swartenbroekx

Re: All flash array, how can bottleneck be Source?

Veeam Logoby stevenrodenburg1 » Mon Sep 05, 2016 12:08 pm 1 person likes this post

I think the way that the "bottleneck" information is displayed to us humans needs to be re-evaluated.

I run an 8-node VSAN environment. It's insanely fast, backup speeds easily reach 400 to 500 megs a second, a backup-job of close to 3 TB simply flies, all is good and still, every day, our "source" (the super duper speedy VSAN) is one hell of a bottleneck at over 90%. Say what?!? "What the heck am i'm doing wrong" were my thoughts initially. Just like most of us.

Then it dawned on me: i'm not doing anything wrong. It's all fine.

The computer calculates and displays the info in a certain way. It's percentages. It's "things in relation to other things". But us mere humans cannot deal with the fact that there is a bloody bottleneck every time, despite our efforts and investments. Our brains are wired to react like this.

We are presented with a "negative information". In this case the presence of a bottleneck
Reaction: oh no !!

Sure, if the backup speed is low it can help (the bottleneck is then interpreted as helpful information), but if your environment runs like a monkey with it's ass on fire, and you still see that dreaded bottleneck every day, you start to doubt yourself.

Our brains are wired that way. It's basic psychology.

Hence my suggestion of presenting the information differently. As it is now, it's always negative. As if there is something wrong.
stevenrodenburg1
Expert
 
Posts: 115
Liked: 18 times
Joined: Tue May 31, 2011 9:11 am
Location: Switzerland
Full Name: Steven Rodenburg

Next

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 23 guests