NetAPP backup performance

VMware specific discussions

NetAPP backup performance

Veeam Logoby daniel.negru » Wed Mar 04, 2015 5:02 pm

Hi everyone,

I was wondering if there is anyone with an as close as possible similar setup as mine.
We bought a NetAPP FAS 2552 with SAS and SATA drives, it was supposed to replace our aging EMC CX3-20. We do have an Dell MD 3620i as well in our datacenter.

I cannot wrap my head around these performance issues I see, NetAPP is consistently slower in backups compare with MD and even way slower compare with the EMC. It looks to me like our upgrade is actually a real big downgrade. What kind of speeds do you get with such setup?

My setup details: iSCSI 10 G with jumbo frames, Round Robin, software iSCSI initiators on ESXi 5.5, veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.
Full backup jobs run at ~100MB on NetAPP and at 2-3 times that speed against MD. I compare this for both of them at SAS 10 k rpm. NetAPP 36 disks in 2 x raid dp agregate , MD in 24 disks on raid 10. For both bottleneck is always the source, followed far by proxy. Network and destination is pretty much 0%.

Please, if anyone has a NetAPP in their care, can you please share here what kind of speeds do you have in your environment? I wonder if my expectations of it are not too high or maybe some other issues are in my environment.

Thank you,
Daniel.
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby Vitaliy S. » Fri Mar 06, 2015 5:21 pm

Hi Daniel,

What backup transport mode were you using with EMC CX3-20 and Dell MD 3620i ? What were the bottleneck stats for that jobs?

daniel.negru wrote:veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.

Have you tried using direct SAN mode with your NetApp storage, so that we could compare hotadd/storage snapshots and direct SAN mode performance?

Thank you!
Vitaliy S.
Veeam Software
 
Posts: 19541
Liked: 1098 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: NetAPP backup performance

Veeam Logoby daniel.negru » Mon Mar 09, 2015 12:00 am

Hi Vitaly,

My quest here is to find other users with as close as possible similar setups and see if my expectation are not somehow too optimistic.

I have tried NetAPP storage snapshots, even bought the veeam enterprise plus upgrade to accommodate it (pricey!). Speed versus hotadd is within the same throughput, no real difference.
On a side note: storage snapshots are marketed by veeam as little to no impact to the virtual environment, which is not true at all. Storage will be hammered, storage will be sluggish during snapshots removal for hours. Yes, ESXi is not directly affected by veeam, but will be adversely affected by slow downs of storage, at least with this blazing slow NetAPP I have seen snapshots being still in removal many hours after and latency spiking during such events.

CX3-20 is not on vCenter cluster no more, cannot compare with NetAPP anymore. It was on FC 4 Gb.

Always I was using hotadd (CX3 / MD or NeAPP) with 1 proxy per host to distribute load.

In every test I could conceive, NetAPP is at least 60% slower, sometimes up to 200% slower. Always the source is being reported as the bottleneck, followed (at far) by proxy. Network and destination are fine usually.

So I am curious if there is any NetAPP users, preferably on 10 Gb iSCSI with FAS 2500 series, and what kind of speed are they getting.

Thank you,
Daniel.
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby daniel.negru » Mon Mar 09, 2015 1:40 pm

Hi Vitaly,

I did not fully realize you are asking about SAN direct access, different than storage snapshots. In your view is it really much faster?
No, I have not used it ever. I have always stayed away from it as I feel uncomfortable in having a VMFS attached to a windows that can, in some scenarios, auto mount it, write signature and trash the volume. No performance gain can outweigh my fear of it. I know, I am a chicken.

Daniel.
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby Vitaliy S. » Mon Mar 09, 2015 1:43 pm

It is not much faster, but it will be helpful to know the performance of the job in the troubleshooting process. I just want to understand whether NetApp storage is slower than your previous configuration or there is something else we'are missing. As to the possibility of corrupting VMFS, please check out my response in this thread > Veeam Proxy not capable of doing Storage Snapshot
Vitaliy S.
Veeam Software
 
Posts: 19541
Liked: 1098 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: NetAPP backup performance

Veeam Logoby jb1095 » Mon Mar 09, 2015 8:45 pm

daniel.negru wrote:Hi everyone,

I was wondering if there is anyone with an as close as possible similar setup as mine.
We bought a NetAPP FAS 2552 with SAS and SATA drives, it was supposed to replace our aging EMC CX3-20. We do have an Dell MD 3620i as well in our datacenter.

I cannot wrap my head around these performance issues I see, NetAPP is consistently slower in backups compare with MD and even way slower compare with the EMC. It looks to me like our upgrade is actually a real big downgrade. What kind of speeds do you get with such setup?

My setup details: iSCSI 10 G with jumbo frames, Round Robin, software iSCSI initiators on ESXi 5.5, veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.
Full backup jobs run at ~100MB on NetAPP and at 2-3 times that speed against MD. I compare this for both of them at SAS 10 k rpm. NetAPP 36 disks in 2 x raid dp agregate , MD in 24 disks on raid 10. For both bottleneck is always the source, followed far by proxy. Network and destination is pretty much 0%.

Please, if anyone has a NetAPP in their care, can you please share here what kind of speeds do you have in your environment? I wonder if my expectations of it are not too high or maybe some other issues are in my environment.

Thank you,
Daniel.



we too have the same results with all of our backups. We are running an ISCSI(10G) Netapp FAS 3240 with nearly the same setup as you. Welcome to the world of NetApp.
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Re: NetAPP backup performance

Veeam Logoby daniel.negru » Mon Mar 09, 2015 9:37 pm

Hi,

Thank you jb1095 for your input in this thread.
So it seems I am not alone in this...

My big problem with this is not the backup speeds, but based on what I see we cannot possible use the device for what we intended and we wasted our much needed budget on a toy.
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby kte » Tue Mar 10, 2015 5:48 am

try nfs shares in stead if iscsi, disable jumbo frames everywhere, enable flow control on you're network nas dedicated switches, which should have enough buffers to not drop the packets when enabling flow control

don't use more the 50% of the system capacity, disable dedup,....
kte
Expert
 
Posts: 172
Liked: 7 times
Joined: Tue Jul 02, 2013 7:48 pm
Full Name: Koen Teugels

Re: NetAPP backup performance

Veeam Logoby jb1095 » Tue Mar 10, 2015 12:55 pm

kte wrote:try nfs shares in stead if iscsi, disable jumbo frames everywhere, enable flow control on you're network nas dedicated switches, which should have enough buffers to not drop the packets when enabling flow control

don't use more the 50% of the system capacity, disable dedup,....



The problem with NFS for us is that it is not technically supported in Exchange. While that may be a political thing, the fact is Microsoft themselves say they don't support it because all kinds of weird things can happen(I have seen it run fine in NFS also for the record). In our environment, we are utilizing Cisco UCS blades, Nexus switches, all 10G, we followed best practice every step of the way...the bottleneck is the SAN, plain and simple. We had to create a passive exchange server just for backup purposes(also best practice), just so we could snap exchange...We ran into the 20 second stun problem that many users face, we also ran into an issue where our passive Exchange VM would go offline completely when removing the snap. Our solution to that was to move it to a SAS aggregate.

I do not mean to hijack this thread at all kte, but I am curious as to why you are suggesting that he disable Jumbo frames. Everything I have read regarding our environment is best practice, and I assure you, jumbo frames is big for us, as is dedupe. I am just not sure I would want to disable some of the most useful functions(that we also paid heavily for). As we are a FlexPod shop, with all best practice methods followed, I would have never thought we would see the issues we have seen over the last couple years but I do however agree that he should not be using more than 50% of his system capacity which brings me to my next point.

I think our biggest issue is that we probably undersized our storage when we first built it a couple years ago, so now we are in an environment where we need more storage, which just by normal rules will be faster regardless, so I can not blame the Netapp entirely. I will also say that even when we first got our Netapp in house, the performance was iffy. The speeds I used to see out of our EMC VNX5300 and prior to that our Clarion CX3-10 were both much faster than our Netapp ever was.


Daniel, if possible, can you check your vcenter and look at your disk read/write latency. Also, check out your CPU CoStop. I am guessing you will find high latency on many of your busiest VMs, and if you do a continuous ping to your machines being backed up, you will probably notice that they lose packets while the snapshot is being removed. Try to move anything that is deemed critical to your SAS aggregate, and like KTE said, make sure that you are not exceeding 50% of your resources on the Netapp. Also, what OnTap version are you on?
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Re: NetAPP backup performance

Veeam Logoby daniel.negru » Tue Mar 10, 2015 7:36 pm

Thank you kte and jb1095 for helping me here.

I do not like disabling DeDupe, it is a great feature and was the main purchase decision. Too bad that while it saves us 20-30 TB, when it is running (for days in row) makes the SATA aggregatte pretty much useless. I have seen 100-200ms latecy on it during.
NetAPP advice? Do it outside business hours.
Unfortunately not every bussiness has 3-4 days maintenace window.

Jumbo frames have been enabled after initial install and it improved the throughtput.

We work with the company that did the install to troubleshoot this, they want to try NFS as well.
I work with NetAPP support directly too.

I do not endorse the idea of NFS because I already have an iSCSI infrastructure and used by Dell MD, NFS would be an addition to it, increased setup and complexity and so on and so forth. iSCSI switches are simple, not stacked, fully lit (no spare ports). NFS would require for HA to use stacked switches and cross stack LACP.

We are a little above 50% usage and hearing this 50% rule makes me mad. So in the future I will need to buy more drives even if I do have the capacity, all for the sake of performance.

RE: I am guessing you will find high latency on many of your busiest VMs
It happens at time to see tens of ms on SAS, on SATA even blips to 100 ms+. Veeam ONE is complaining of SATA aggregate latecy quite often.

RE: and if you do a continuous ping to your machines being backed up, you will probably notice that they lose packets while the snapshot is being removed.
That is always true, especially on slow writes SANs, when dealing with ESXi snapshot removal. I am backing up the active DAG exchange 2013 server, only a handfull times it was evicted and failed over to DR. We may employ (testing now) NetAPP storage snapshots for mitigate this.

RE: Also, check out your CPU CoStop
I have no idea what that one is, I must look it up.

RE: what OnTap version are you on?
It is a very recent install, 8.2.2 7-mode.

By the experience JB1095 shared, I would say that it seems NetAPP is a huge mistake to buy. CX3-20 was blazing fast compare with it, even the Dell entry level MD runs in circles around it.
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby jb1095 » Wed Mar 11, 2015 2:58 pm

I will respond more later on today as I am involved with another project at the moment, but I wanted to clarify a point that I made in response to KTE. As a general rule, you really do not want to bring any SAN over 50-60% utilization, however, of all the SANS available, NetApp allows you to bring it up to 80-85% before suffering a serious performance drop off. Some would even argue that you can bring it up over 90 because of WAFL. I was just throwing the 50% rule out there as it is fairly common in the industry and the biggest mistake most IT shops make is to undersize their storage environment.

If you want to get immediate performance gains, move everything critical to your SAS aggregates and maybe inquire about getting a SAS shelf added on. You will not regret it. We are about to go into a POC with a couple new arrays and I will keep you posted here, but I am willing to bet our snapshot" issue will be completely eliminated once we hit any of the new arrays.
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Re: NetAPP backup performance

Veeam Logoby jb1095 » Wed Mar 11, 2015 3:25 pm

RE: That is always true, especially on slow writes SANs, when dealing with ESXi snapshot removal. I am backing up the active DAG exchange 2013 server, only a handfull times it was evicted and failed over to DR.

This is why we only snap our passive copy of exchange, however, we were seeing the passive VM drop for minutes upon removing the snap, and sometimes it would not even get the snap at all as the hard coded 20 second stun bit us. I mean, it has always been the case that you will lose a few pings, but in my opinion, it is not acceptable to lose a VM for minutes at a time just because it is removing a snapshot.(moving it to our SAS aggregate reduced this problem so it now only drops for a few seconds). Our snapshot create, and snapshot removal times have decreased drastically too. on SATA, our snap removal would sometimes take 5 hours or so(usually about 90 minutes). On SAS, its between 5-7 minutes to remove. On create, it used to be from 15-23 seconds, now its between 7-13.

One other thing, with us anyway, we weren't really seeing high write latency on our unit, we were seeing high read latency. In fact, we still are for anything on our SATA aggregate. So nothing critical is on that aggregate.

RE: We may employ (testing now) NetAPP storage snapshots for mitigate this.

In order to do this, you would have to build a new Exchange VM and use snapdrive in guest iSCSI presentation to the exchange volumes. Every database and every log file needs its own individual volume. Basically, you would have to redesign your entire exchange environment and trust me, you will experience issues. The way you are doing it now is what we found to be the easiest, most reliable, and cleanest method of doing it. Veam B&R, & SRM.
the performance would be fine in SME because you are not snapping the guest as you are using the volume level snapshots but retention sucks. Its based on how much retention you allocate on the volume. Oh and also, have you dealt with NetApp support yet? Around SME, it is basically non-existent...very hit or miss and even when you do hit a good tech, there will be weeks/months worth of redundant troubleshooting. If this is a new device, you may want to talk to your sales rep and get a loaner unit to migrate to...at the end of the day, we wish we never bought any SATA drives with our NetApp. Had we gone all SAS shelves, we would have been much better off.

Also, I do not feel that NetApp themselves have a lot of blame here. We didn't realize what our actual workloads would be with everything we wanted on our SAN when all was said and done and "thought" that Exchange was supposed to be much more storage latency friendly(touted by Microsoft themselves). Our NetAPP has been good to us for the most part, but I think we just undersoized it from day one. We did not follow the rule of sizing....

Do not size off of space requirements, size off IOPS first, space second.
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Re: NetAPP backup performance

Veeam Logoby jb1095 » Wed Mar 11, 2015 3:36 pm

RE: I have no idea what that one is, I must look it up. (in regards to CPU CoStop)

Let me save you the trouble, do not even bother. You will end up going down the rode of oversized VMs where you will be advised to reduce the number of vCPUs in your environment, you will be told to move VMS around to balance out vCpus per host, and at the end of the day, the performance difference you see will be negligible. Move some high co-stop vms from your SATA aggregate to your SAS aggregate and you will see the CPU co-stop flatline at zero. We saw upwards of 4000ms CSTP times on SATA for months suddenly drop to zero once we moved the VM to SAS. The rabbit hold of CSTP was along one but ultimately the root cause was storage performance.


RE: It is a very recent install, 8.2.2 7-mode.(OnTap question)
8.2.2 P2? 7-mode.
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Re: NetAPP backup performance

Veeam Logoby daniel.negru » Wed Mar 11, 2015 5:36 pm

Hi Jon,

RE: In order to do this, you would have to build a new Exchange VM and use snapdrive in guest iSCSI presentation to the exchange volumes. Every database and every log file needs its own individual volume.
I don't think I would agree with this. In my view, the way Veeam + NetAPP snapshots works it is something along this lines: it takes a guest snapshot as it would normally do (a few seconds or more for app aware), then a storage snapshot (5 secs tops), then is removing the VMware snapshot (quick and painless as the snapshot is young) and then mounts the LUN in proxy and grab the data from the VMFS snapshot. Since the snapshot of VMware is in the snapshot of the LUN in consistent state, all is nice. The storage will eventually be instructed to remove snapshot at the end and will eventually actually remove it (hours later in my tests).
Anyone from Veeam can confirm this?!

The SATA is for test and dev, I tried to keep away from SATA anything remotely production important.

RE: Oh and also, have you dealt with NetApp support yet? Around SME, it is basically non-existent...very hit or miss and even when you do hit a good tech, there will be weeks/months worth of redundant troubleshooting.
Hmm... this pretty much sounds like my experience with them so far, though I am not in the months since opening the ticket. getting there...

RE: 8.2.2 P2? 7-mode
I see no P2 in description, so it seems to be just 8.2.2

Regarding the sizing of our NetAPP: we provided to the vendor reports on iops, read/write ratio and so on and so forth. They knew what is supposed to replace (EMC CX3 series) and this is the solution they proposed. While I should be a better storage admin and do research & study, I am not really a storage admin, never really been one. Small shop, you know: the usual spread responsibilities across storage/exchange/network/virtualization and the list goes on. The guilt lies with me and with them in the same time in my view.

I personally doubt another shelf of SAS will solve anything. Controllers themselves are 80%-90% CPU usage on a single but massive storage vmotion to SAS I have made few days ago. Maybe migrating to 8000 series (maybe plus extra SAS shelf) is the solution but there is no budget. I am getting reassurance from NetAPP rep they look at the issue and will find a solution for us.

RE: We are about to go into a POC with a couple new arrays and I will keep you posted here, but I am willing to bet our snapshot" issue will be completely eliminated once we hit any of the new arrays.
Some of the new hybrid things are you trying? Tegile/Nimble or such? When the exchange was on Dell MD 3620 or EMC CX3, I have never had any issues with the snapshots removal. Not once in 1 year.

Thank you.

Anyone else with NetAPP + 10G iscsi + Veeam on this forum?
daniel.negru
Influencer
 
Posts: 17
Liked: 1 time
Joined: Wed Feb 13, 2013 5:36 pm
Full Name: Daniel Negru

Re: NetAPP backup performance

Veeam Logoby jb1095 » Wed Mar 11, 2015 6:06 pm

RE: I don't think I would agree with this. In my view, the way Veeam + NetAPP snapshots works it is something along this lines: it takes a guest snapshot as it would normally do (a few seconds or more for app aware), then a storage snapshot (5 secs tops), then is removing the VMware snapshot (quick and painless as the snapshot is young) and then mounts the LUN in proxy and grab the data from the VMFS snapshot. Since the snapshot of VMware is in the snapshot of the LUN in consistent state, all is nice. The storage will eventually be instructed to remove snapshot at the end and will eventually actually remove it (hours later in my tests).
Anyone from Veeam can confirm this?!

When it comes to SME, there are certain things you will see once you attempt to utilize it, that is what I was referring to. It was definitely not a friendly or easy experience for us and we tried it the way NetApp wanted. We had to rebuild our Exchange environment multiple times just to iron out the kinks and establish Best Practice for our environment. I suspect that once you attempt it, you will see what I am talking about here. Either way, I wish you the best of luck in using the NetApp without redesigning your current Exchange, while trying to have decent retention.

RE:I personally doubt another shelf of SAS will solve anything. Controllers themselves are 80%-90% CPU usage on a single but massive storage vmotion to SAS I have made few days ago. Maybe migrating to 8000 series (maybe plus extra SAS shelf) is the solution but there is no budget. I am getting reassurance from NetAPP rep they look at the issue and will find a solution for us.

how are you getting that CPU metric? There are a couple of ways to get the metric from your controllers. The one listed on the pre-work checklist when upgrading your OnTap controller software only tells you the highest CPU usage of 1 of the CPUs and not the average of the bunch. We ran into this a few months ago when bumping to 8.2.2 p2 7 mode. We were panicked because our CPU usage was well above 50% and the guide told us we can not do the upgrade with CPU utilization over 50. After much digging, we dropped to shell to get more accurate info. I will be happy to share that with you if you like.

RE: Regarding the sizing of our NetAPP: we provided to the vendor reports on iops, read/write ratio and so on and so forth. They knew what is supposed to replace (EMC CX3 series) and this is the solution they proposed. While I should be a better storage admin and do research & study, I am not really a storage admin, never really been one. Small shop, you know: the usual spread responsibilities across storage/exchange/network/virtualization and the list goes on. The guilt lies with me and with them in the same time in my view.

I hate that we send all the info requested to our vendors and they come back with something they claim is perfect for us, but then we run into these situations, where we are up the creek without a paddle with little to no recourse. I don't know of any company that has a dedicated storage admin. This is why we can never blame ourselves when we get an undersized unit. We send all the information they request and they have a team of experts that are supposed to size the environment properly. Most of the time, we either never get approval to get the recommended unit, or the recommended unit is sized incorrectly. We don't ever find out until it is too late.

RE: Some of the new hybrid things are you trying? Tegile/Nimble or such? When the exchange was on Dell MD 3620 or EMC CX3, I have never had any issues with the snapshots removal. Not once in 1 year.

We are looking at Nimble, Pure, and the Hyper-Converged solution Nutanix. We looked at many others along the way, but these are the ones we will probably POC. In fact, we still have to weed one of these out as we really do not want to POC 3 units. Each has their own qualities we love. Pure is probably the fastest but also the most expensive(we are nervous they will be acquired by someone soon though). Nimble has great speeds(not as fast as Pure) but more cost effective and great bang for the buck. Nutanix forces us to buy compute/memory and storage and basically walk away from our UCS environment eventually but may be the way things are going, plus we can just add nodes when we need more power, so its not a total forklift every 3 years like most other options. All 3 options have stellar support and will assist you with anything in your environment. I personally am leaning towards Nimble because of the great speeds and reviews, but others in my group are leaning towards Nutanix. Once we get whatever we choose into POC, we will surely see the difference between them as we are going to present the new array to vCenter and vmotion and Storage vMotion our exchange environment off the NetAPP, so we will have an apples to apples comparison to do a full test for at least a month or 2 before we make any decision to purchase.
jb1095
Enthusiast
 
Posts: 35
Liked: 10 times
Joined: Tue Mar 03, 2015 9:32 pm
Full Name: Jon Brite

Next

Return to VMware vSphere



Who is online

Users browsing this forum: Dima P., Exabot [Bot], stehei and 29 guests