All flash array, how can bottleneck be Source?

HendersonD · Post by **HendersonD** » Aug 31, 2016 9:10 pm this post

We have two new Nimble arrays. An all flash array in production and one of their hybrid arrays in our DR site
The two sites are across campus from one another and connected with 10 gig fiber
The all flash array is the source for backups, the array in DR is the target
Single physical proxy server using DirectSAN for both connections with 10 gig connections to both sites. This proxy has mounted on it an E: drive from the DR array which is where the backups are stored
We are doing forever forward incremental
My backup for the past few days runs at 146MB/s. I click on each VM backup to look at the details and it says the Source was the pinch point, most of the time Source is shown at 99%

How can the Source be the bottleneck when the read speed and throughput on an all flash array is huge?
Is there some mis-configuration on my Nimble Array or within Veeam?

Post by **foggy** » Sep 01, 2016 4:48 pm this post

HendersonD wrote:My backup for the past few days runs at 146MB/s.

Did it run faster previously? Basically, if source is a bottleneck, it means that data cannot be retrieved from the storage any faster. I'd check the firmware and make sure you're using latest storage drivers. Probably other community members can chime in to report their performance values.

Feel free to open support case in order to let our team take a closer look at your environment.

Post by **tsightler** » Sep 01, 2016 5:34 pm this post

The storage can easily still be the bottleneck because, based on your setup. Veeam measures bottlenecks at 4 points, source, proxy, network, and target. In you're setup there's no network traffic (both proxy and source are on the same server so it uses shared memory, which is still way faster than flash), and the target is likely writing 50% less data than is being read from the source since the data is compressed, so that's unlikely to be the bottleneck. Proxy, is the measure of CPU time spent on the proxy, which is unlikely to be very high if your total throughput is only 146MB/s.

You still have to read the uncompressed data from the array over whatever type of interconnect you are using to read data from the array, and it's that interconnect that's the most like candidate and I would agree that 146MB/s seems somewhat slow for an all flash array.

Unfortunately, I don't have enough information to really make even an educated guess, so for now I'll just ask a lot of questions:

What type of initiators are you using?
Do you have specific links dedicated for both ingest (reads from the source array), vs egress (writes to the target array)?
Is your proxy tuned to minimize response time by disabling things like interrupt mitigation on the HBA/network adatpers?
Are you running many VMs in parallel and is 146MB/s the aggregate speed?
Are full backups just as slow (incremental backups are notoriously difficult to judge speed because sometimes they read so little data it's hard to calculate a true throughput?
How much total change was in the job that reporte 146MB/s, i.e. read vs transferred?

HendersonD · Post by **HendersonD** » Sep 01, 2016 6:06 pm this post

The two Nimble arrays are new 3 weeks ago and we got Veeam running for the first time 2 weeks ago so we have no historical data to compare against

On the Proxy box we have Nimble Connection Manager installed which is a nice front end to Microsoft iSCSI. So on the proxy box we are using microsoft's iSCSI initiators
On the ESXi side we are using their software initiators
We do not have specific links dedicated to reading/writing. The physical proxy server has two 10gig ethernet connections in my storage vlan which of course is not routable which is best practice. Not sure how Veeam decides which interface to use at any one point in time
Not familiar with what types of tuning I should use on the proxy server. If there is some type of KB you can point me to that would be great
146MB/s is the aggregate speed. There are 15VMs being backed up. Since we are using forever forward incremental, change block tracking kicks in. Last night's run:
Processed: 5.3TB
Read: 237GB
Transferred: 77.5GB
Our one full backup we ran two weeks ago when we first installed Veeam screamed. So yes, our incrementals are much slower but you are correct it has to slog through a lot of data to eventually just write what has changed. Perhaps I do not have a clear understanding of what 99% Source means in relation to an incremental backup

Post by **tsightler** » Sep 01, 2016 6:54 pm this post

Thanks for the data, sorry for a couple more questions, how long did the total backup (the one you provided stats for) actually take? Oh, and are all these VMs only on a single datastore? I'm thinking you don't really have an infrastructure problem if your full backups screamed.

HendersonD · Post by **HendersonD** » Sep 01, 2016 8:43 pm this post

The full backup ran about 350MB/s so more than twice the speed of the incrementals. The backup from last night took 43 minutes to complete. These VMs live on three datastores. In ESXi I have the three datastores in a single datastore cluster. I do know that Veeam spends a lot of time just reading through all this data trying to parse out what has changed and that takes time. You are correct that the other 3 possible bottlenecks may pale in comparison to reading all of this data even though it is read fast there is a lot of it. Perhaps what I am seeing is normal behaviour

Sep 01, 2016 9:15 pm

Something will always be listed as the bottleneck, that something is the part of the chain that we spent the most time waiting for, but that doesn't mean there is a problem. In your case your target storage only had to write 77GB of data, or 3x less than what we had to read, because that was what was saved by data reduction (compression/dedupe), and as mentioned above, unless you're using 100% of the CPU on the proxy, or 100% of the network bandwidth, those are not going to be listed as the bottleneck.

Also, the read of incremental data is random, rather than sequential, which has significant less impact on a flash device than spinning disks, but still has some impact. I think you're seeing pretty decent speed overall. I think only way you could get faster is by increasing the number of parallel tasks, I'm assuming you're seeing at least some queuing when VMs are waiting for resources.

HendersonD · Post by **HendersonD** » Sep 01, 2016 9:28 pm this post

I just started my backup job to take a look
I have 4 VMs processing right now with all the rest in a pending state with this message
9/1/2016 5:22:49 PM :: Resource not ready: Backup repository

My proxy box is fairly beefy so it is set to have 12 concurrent tasks
When it says that "Resource not ready: Backup repository" what exactly is happening? What exactly is it seeing in my backup repository that has it only processing 4 or 5 VMs at a time during backup?

You are correct, if there was some way to have it process more VMs at once, it would be great

Post by **Gostev** » Sep 01, 2016 9:49 pm this post

It means that backup repository has no more task slots available due to processing other VMs (max concurrent tasks value reached).

Post by **tsightler** » Sep 01, 2016 9:53 pm this post

Sounds like you have your repository limited to 4 tasks, which is the default, designed to keep from overloaded the repository. Since your repository is quite fast, I'm sure you can increase this limit, at least if your proxy/repository has enough cores and memory. The tasks limit can be changed on the "Repository" tab of the repository properties window, and is listed under "Load control". I suspect if you go to 8 or 12 tasks you'll significantly increase the overall performance.

HendersonD · Post by **HendersonD** » Sep 02, 2016 1:34 pm this post

Jacking up the backup repository limit from 4 to 8 helped. I am not getting 166 MB/s and my backup went from 45 minutes to 39 minutes. The bottleneck analysis is interesting on the entire job
9/2/2016 8:52:31 AM :: Load: Source 94% > Proxy 23% > Network 26% > Target 0%

I think the source at 94% is a bit of a red herring. Yes, it is probably the bottleneck in relation to Proxy, Network, and Target but I am guessing I am getting as much out of my source as possible. I have two good size file servers and I have indexing turned on for this backup job. In looking at these two file servers, one takes 6 minutes to index and the other one a whopping 27 min. The one that takes 27 min actually has less total data on it but a lot of file/folders to work through. It is the files server that holds the home directories for the 4,300 students on my campus. Since we are K-12 there are probably 2,000 of these home directories (upper class students) that contain a lot of files/folders.

What controls the speed of indexing and is there any way to speed it up? I am guessing it is this indexing that is showing the source as the bottleneck

Post by **Gostev** » Sep 04, 2016 7:50 pm this post

No, indexing is not a part of bottleneck numbers as they are reported by data movers.

Delo123 · Post by **Delo123** » Sep 05, 2016 7:39 am this post

Processed: 5.3TB
Read: 237GB
Transferred: 77.5GB

Your backup job is hardly reading any data, not sure how skipping data due to CBT is being measured but this is what we also see on our all flash arrays. Full Backups are done between 1GB/s and 2GB/s but incrementals of static data only run at about 200-300MB/s...

robbys · Post by **robbys** » Sep 05, 2016 9:00 am this post

I have also an all flash storage solution, but with HPE VSA.
If he fails back to normal backup I get 70MB/s, If I do storage snapshots on VSA, I get around 400-550MB/s. With the storage snapshots, I'm running into the limitation of my backup server (that doubles as proxy too), then the proxy or the network are the bottleneck. With the normal backup, the source is always the bottleneck.

So it depends heavely on how you contact the source and with a speed of 200-300MB/s you can not saturate the backup server itself if it is a new one, it is still hold back at the source (not the disk, but the way you access the disk and how it impacts the rest of the server).

stevenrodenburg1 · Sep 05, 2016 12:08 pm

I think the way that the "bottleneck" information is displayed to us humans needs to be re-evaluated.

I run an 8-node VSAN environment. It's insanely fast, backup speeds easily reach 400 to 500 megs a second, a backup-job of close to 3 TB simply flies, all is good and still, every day, our "source" (the super duper speedy VSAN) is one hell of a bottleneck at over 90%. Say what?!? "What the heck am i'm doing wrong" were my thoughts initially. Just like most of us.

Then it dawned on me: i'm not doing anything wrong. It's all fine.

The computer calculates and displays the info in a certain way. It's percentages. It's "things in relation to other things". But us mere humans cannot deal with the fact that there is a bloody bottleneck every time, despite our efforts and investments. Our brains are wired to react like this.

We are presented with a "negative information". In this case the presence of a bottleneck
Reaction: oh no !!

Sure, if the backup speed is low it can help (the bottleneck is then interpreted as helpful information), but if your environment runs like a monkey with it's ass on fire, and you still see that dreaded bottleneck every day, you start to doubt yourself.

Our brains are wired that way. It's basic psychology.

Hence my suggestion of presenting the information differently. As it is now, it's always negative. As if there is something wrong.

Delo123 · Post by **Delo123** » Sep 05, 2016 12:11 pm this post

But your Source is still the bottleneck right? 400/500 Megs /s could be 1GB/s or 2 or 10 so it still is a bottleneck...

stevenrodenburg1 · Sep 05, 2016 12:18 pm

Sure. According to the way Veeam calculates it yes. But the VSAN is not really pushed that hard during backups. We have two proxies, processing 4 VMDK's in parallel ( 8 in all ).

My point is: there will always be "the slowest component dragging things down".
I'm perfectly satisfied with our system and in my mind, ignore that "negative information" I get fed every day.

This topic is as old as the day when Veeam introduced the "bottleneck display feature". People have been "worrying over nothing" ever since (well, not always, folks with performance issues have valid things to worry about).

This feature, though meant with the best intentions, has a "dark side" too

Sep 05, 2016 6:51 pm

I feel that at some point it would just be enough to replace the term "bottleneck" with something less negative, and magically all these "issues" would disappear

chrisdearden · Post by **chrisdearden** » Sep 05, 2016 6:55 pm this post

I like "rate limiting/determining step" from my Chemistry days, but its probably harder to explain.

Sep 05, 2016 6:58 pm

Let's just flip the numbers in the percentage and call it "best performing components"

stevenrodenburg1 · Sep 05, 2016 7:01 pm

Good suggestions. Keep em coming

Post by **tsightler** » Sep 05, 2016 8:18 pm this post

I thought about suggesting we just call it "Utilization", the step with the highest utilization percentage would still technically be the bottleneck, but it wouldn't have such a negative connotation, similar to the way network, CPU, and memory are reported in percent of utilization. However, the problem with that is, if a component is performing poorly, and we're spending a lot of time waiting on it, it would still show high utitlizaton, which wouldn't really be the right term in that case.

daniel.farrelly · Sep 06, 2016 3:40 pm

Our source "bottleneck" is NVMe flash. We routinely hit ~2TB/s during nightly incrementals.

Delo123 · Sep 06, 2016 6:09 pm

2TB/s sounds quite much, what array are you using to archieve that? We have nvme source but " only" get up to 6GB/s...

HendersonD · Post by **HendersonD** » Sep 06, 2016 7:58 pm this post

I agree that the term bottleneck to an IT professional means it is time to hunt for a solution. I am the original poster and am coming to the realization that Source in my case is not really a bottleneck even though Veeam reports it this way. My new Nimble all flash array is barely breaking a sweat during backups. It is capable of churning out a lot more IOPS and throughput than Veeam is asking it to.

daniel.farrelly · Sep 06, 2016 9:31 pm

daniel.farrelly wrote:Our source "bottleneck" is NVMe flash. We routinely hit ~2TB/s during nightly incrementals.

Typo. Meant 2 gigs a sec.

nmdange · Post by **nmdange** » Sep 06, 2016 10:45 pm this post

I think the problem with the stats is that it doesn't take into account time spent doing things other than IO. It would be nice if things like time it takes to build the VM list, take a snapshot, mount a snapshot on a proxy, do a merge, etc. were included to get a better idea of how much the source storage system is really the bottleneck.

Post by **Gostev** » Sep 06, 2016 11:47 pm this post

You can already see in the job log how much time all of the above-mentioned operations take comparing to actual data movement. Bottleneck analysis is for data movement only, exactly for the reason to be able to show you "how much the source storage system is really the bottleneck" in actual data processing, excluding preparations - length of which makes no difference anyway considering that usually, multiple VMs are being processed in parallel and data movement through backup infrastructure is continuous.

nmdange · Post by **nmdange** » Sep 07, 2016 12:33 am this post

For certain stats like time spent taking a snapshot or mounting it on a proxy, you can only see it by looking at each VM separately. Thus if some VMs take a lot longer than normal to take a snapshot, there's no obvious way to see if the backup is taking longer because of this without checking every single VM one at a time.

What I've noticed is when there is very little change data per VM, the time spent doing things like taking snapshots actually becomes significant enough that the job as a whole takes longer. If I look at the throughput graph of incrementals, I often see a lot of gaps where no data movement is happening. On the rare occasion I've had to do full backups (e.g. after a CBT reset), the throughput graph shows a constant and very high data rate (500MB/s-1GB/s).

If I watch a backup as it runs, I can see when VMs are waiting or in the middle of taking a snapshot as opposed to actually backing up data. However it is harder to figure out after the fact because each VM just lists how much time it spent and it's not easy to match that up to gaps in the throughput graph. Watching live, I can see VMs spend time in "Resource not ready" state. For me it is always "Resource not ready: snapshot" because with Hyper-V you can only get 4 VMs at a time per CSV. This could also be the backup proxy or the repository depending on the max number of tasks configured there. Having some way to correlate the gaps in the throughput graph to what is causing the wait is what would be helpful.

HendersonD · Post by **HendersonD** » Sep 07, 2016 12:45 am this post

So things brings us full circle, how can Source be the bottleneck with an all flash array? In my case my backup is about 40 minutes but 27 minutes is spent indexing a large file server. If this indexing operation is not counted in the bottleneck analysis how is Source the bottleneck? Are the percentages relative to each other? In other words, more is happening on the data read side than the network, proxy, and target side therefore the bottleneck analysis says that Source is the bottleneck?

R&D Forums

All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Re: All flash array, how can bottleneck be Source?

Who is online