Host-based backup of VMware vSphere VMs.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

NetAPP backup performance

Post by daniel.negru »

Hi everyone,

I was wondering if there is anyone with an as close as possible similar setup as mine.
We bought a NetAPP FAS 2552 with SAS and SATA drives, it was supposed to replace our aging EMC CX3-20. We do have an Dell MD 3620i as well in our datacenter.

I cannot wrap my head around these performance issues I see, NetAPP is consistently slower in backups compare with MD and even way slower compare with the EMC. It looks to me like our upgrade is actually a real big downgrade. What kind of speeds do you get with such setup?

My setup details: iSCSI 10 G with jumbo frames, Round Robin, software iSCSI initiators on ESXi 5.5, veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.
Full backup jobs run at ~100MB on NetAPP and at 2-3 times that speed against MD. I compare this for both of them at SAS 10 k rpm. NetAPP 36 disks in 2 x raid dp agregate , MD in 24 disks on raid 10. For both bottleneck is always the source, followed far by proxy. Network and destination is pretty much 0%.

Please, if anyone has a NetAPP in their care, can you please share here what kind of speeds do you have in your environment? I wonder if my expectations of it are not too high or maybe some other issues are in my environment.

Thank you,
Daniel.
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: NetAPP backup performance

Post by Vitaliy S. »

Hi Daniel,

What backup transport mode were you using with EMC CX3-20 and Dell MD 3620i ? What were the bottleneck stats for that jobs?
daniel.negru wrote:veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.
Have you tried using direct SAN mode with your NetApp storage, so that we could compare hotadd/storage snapshots and direct SAN mode performance?

Thank you!
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi Vitaly,

My quest here is to find other users with as close as possible similar setups and see if my expectation are not somehow too optimistic.

I have tried NetAPP storage snapshots, even bought the veeam enterprise plus upgrade to accommodate it (pricey!). Speed versus hotadd is within the same throughput, no real difference.
On a side note: storage snapshots are marketed by veeam as little to no impact to the virtual environment, which is not true at all. Storage will be hammered, storage will be sluggish during snapshots removal for hours. Yes, ESXi is not directly affected by veeam, but will be adversely affected by slow downs of storage, at least with this blazing slow NetAPP I have seen snapshots being still in removal many hours after and latency spiking during such events.

CX3-20 is not on vCenter cluster no more, cannot compare with NetAPP anymore. It was on FC 4 Gb.

Always I was using hotadd (CX3 / MD or NeAPP) with 1 proxy per host to distribute load.

In every test I could conceive, NetAPP is at least 60% slower, sometimes up to 200% slower. Always the source is being reported as the bottleneck, followed (at far) by proxy. Network and destination are fine usually.

So I am curious if there is any NetAPP users, preferably on 10 Gb iSCSI with FAS 2500 series, and what kind of speed are they getting.

Thank you,
Daniel.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi Vitaly,

I did not fully realize you are asking about SAN direct access, different than storage snapshots. In your view is it really much faster?
No, I have not used it ever. I have always stayed away from it as I feel uncomfortable in having a VMFS attached to a windows that can, in some scenarios, auto mount it, write signature and trash the volume. No performance gain can outweigh my fear of it. I know, I am a chicken.

Daniel.
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: NetAPP backup performance

Post by Vitaliy S. »

It is not much faster, but it will be helpful to know the performance of the job in the troubleshooting process. I just want to understand whether NetApp storage is slower than your previous configuration or there is something else we'are missing. As to the possibility of corrupting VMFS, please check out my response in this thread > Veeam Proxy not capable of doing Storage Snapshot
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

daniel.negru wrote:Hi everyone,

I was wondering if there is anyone with an as close as possible similar setup as mine.
We bought a NetAPP FAS 2552 with SAS and SATA drives, it was supposed to replace our aging EMC CX3-20. We do have an Dell MD 3620i as well in our datacenter.

I cannot wrap my head around these performance issues I see, NetAPP is consistently slower in backups compare with MD and even way slower compare with the EMC. It looks to me like our upgrade is actually a real big downgrade. What kind of speeds do you get with such setup?

My setup details: iSCSI 10 G with jumbo frames, Round Robin, software iSCSI initiators on ESXi 5.5, veeam B&R 8 running on hotadd and/or storage snapshots against NetAPP.
Full backup jobs run at ~100MB on NetAPP and at 2-3 times that speed against MD. I compare this for both of them at SAS 10 k rpm. NetAPP 36 disks in 2 x raid dp agregate , MD in 24 disks on raid 10. For both bottleneck is always the source, followed far by proxy. Network and destination is pretty much 0%.

Please, if anyone has a NetAPP in their care, can you please share here what kind of speeds do you have in your environment? I wonder if my expectations of it are not too high or maybe some other issues are in my environment.

Thank you,
Daniel.

we too have the same results with all of our backups. We are running an ISCSI(10G) Netapp FAS 3240 with nearly the same setup as you. Welcome to the world of NetApp.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi,

Thank you jb1095 for your input in this thread.
So it seems I am not alone in this...

My big problem with this is not the backup speeds, but based on what I see we cannot possible use the device for what we intended and we wasted our much needed budget on a toy.
kte
Expert
Posts: 179
Liked: 8 times
Joined: Jul 02, 2013 7:48 pm
Full Name: Koen Teugels
Contact:

Re: NetAPP backup performance

Post by kte »

try nfs shares in stead if iscsi, disable jumbo frames everywhere, enable flow control on you're network nas dedicated switches, which should have enough buffers to not drop the packets when enabling flow control

don't use more the 50% of the system capacity, disable dedup,....
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

kte wrote:try nfs shares in stead if iscsi, disable jumbo frames everywhere, enable flow control on you're network nas dedicated switches, which should have enough buffers to not drop the packets when enabling flow control

don't use more the 50% of the system capacity, disable dedup,....

The problem with NFS for us is that it is not technically supported in Exchange. While that may be a political thing, the fact is Microsoft themselves say they don't support it because all kinds of weird things can happen(I have seen it run fine in NFS also for the record). In our environment, we are utilizing Cisco UCS blades, Nexus switches, all 10G, we followed best practice every step of the way...the bottleneck is the SAN, plain and simple. We had to create a passive exchange server just for backup purposes(also best practice), just so we could snap exchange...We ran into the 20 second stun problem that many users face, we also ran into an issue where our passive Exchange VM would go offline completely when removing the snap. Our solution to that was to move it to a SAS aggregate.

I do not mean to hijack this thread at all kte, but I am curious as to why you are suggesting that he disable Jumbo frames. Everything I have read regarding our environment is best practice, and I assure you, jumbo frames is big for us, as is dedupe. I am just not sure I would want to disable some of the most useful functions(that we also paid heavily for). As we are a FlexPod shop, with all best practice methods followed, I would have never thought we would see the issues we have seen over the last couple years but I do however agree that he should not be using more than 50% of his system capacity which brings me to my next point.

I think our biggest issue is that we probably undersized our storage when we first built it a couple years ago, so now we are in an environment where we need more storage, which just by normal rules will be faster regardless, so I can not blame the Netapp entirely. I will also say that even when we first got our Netapp in house, the performance was iffy. The speeds I used to see out of our EMC VNX5300 and prior to that our Clarion CX3-10 were both much faster than our Netapp ever was.


Daniel, if possible, can you check your vcenter and look at your disk read/write latency. Also, check out your CPU CoStop. I am guessing you will find high latency on many of your busiest VMs, and if you do a continuous ping to your machines being backed up, you will probably notice that they lose packets while the snapshot is being removed. Try to move anything that is deemed critical to your SAS aggregate, and like KTE said, make sure that you are not exceeding 50% of your resources on the Netapp. Also, what OnTap version are you on?
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Thank you kte and jb1095 for helping me here.

I do not like disabling DeDupe, it is a great feature and was the main purchase decision. Too bad that while it saves us 20-30 TB, when it is running (for days in row) makes the SATA aggregatte pretty much useless. I have seen 100-200ms latecy on it during.
NetAPP advice? Do it outside business hours.
Unfortunately not every bussiness has 3-4 days maintenace window.

Jumbo frames have been enabled after initial install and it improved the throughtput.

We work with the company that did the install to troubleshoot this, they want to try NFS as well.
I work with NetAPP support directly too.

I do not endorse the idea of NFS because I already have an iSCSI infrastructure and used by Dell MD, NFS would be an addition to it, increased setup and complexity and so on and so forth. iSCSI switches are simple, not stacked, fully lit (no spare ports). NFS would require for HA to use stacked switches and cross stack LACP.

We are a little above 50% usage and hearing this 50% rule makes me mad. So in the future I will need to buy more drives even if I do have the capacity, all for the sake of performance.

RE: I am guessing you will find high latency on many of your busiest VMs
It happens at time to see tens of ms on SAS, on SATA even blips to 100 ms+. Veeam ONE is complaining of SATA aggregate latecy quite often.

RE: and if you do a continuous ping to your machines being backed up, you will probably notice that they lose packets while the snapshot is being removed.
That is always true, especially on slow writes SANs, when dealing with ESXi snapshot removal. I am backing up the active DAG exchange 2013 server, only a handfull times it was evicted and failed over to DR. We may employ (testing now) NetAPP storage snapshots for mitigate this.

RE: Also, check out your CPU CoStop
I have no idea what that one is, I must look it up.

RE: what OnTap version are you on?
It is a very recent install, 8.2.2 7-mode.

By the experience JB1095 shared, I would say that it seems NetAPP is a huge mistake to buy. CX3-20 was blazing fast compare with it, even the Dell entry level MD runs in circles around it.
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

I will respond more later on today as I am involved with another project at the moment, but I wanted to clarify a point that I made in response to KTE. As a general rule, you really do not want to bring any SAN over 50-60% utilization, however, of all the SANS available, NetApp allows you to bring it up to 80-85% before suffering a serious performance drop off. Some would even argue that you can bring it up over 90 because of WAFL. I was just throwing the 50% rule out there as it is fairly common in the industry and the biggest mistake most IT shops make is to undersize their storage environment.

If you want to get immediate performance gains, move everything critical to your SAS aggregates and maybe inquire about getting a SAS shelf added on. You will not regret it. We are about to go into a POC with a couple new arrays and I will keep you posted here, but I am willing to bet our snapshot" issue will be completely eliminated once we hit any of the new arrays.
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

RE: That is always true, especially on slow writes SANs, when dealing with ESXi snapshot removal. I am backing up the active DAG exchange 2013 server, only a handfull times it was evicted and failed over to DR.

This is why we only snap our passive copy of exchange, however, we were seeing the passive VM drop for minutes upon removing the snap, and sometimes it would not even get the snap at all as the hard coded 20 second stun bit us. I mean, it has always been the case that you will lose a few pings, but in my opinion, it is not acceptable to lose a VM for minutes at a time just because it is removing a snapshot.(moving it to our SAS aggregate reduced this problem so it now only drops for a few seconds). Our snapshot create, and snapshot removal times have decreased drastically too. on SATA, our snap removal would sometimes take 5 hours or so(usually about 90 minutes). On SAS, its between 5-7 minutes to remove. On create, it used to be from 15-23 seconds, now its between 7-13.

One other thing, with us anyway, we weren't really seeing high write latency on our unit, we were seeing high read latency. In fact, we still are for anything on our SATA aggregate. So nothing critical is on that aggregate.

RE: We may employ (testing now) NetAPP storage snapshots for mitigate this.

In order to do this, you would have to build a new Exchange VM and use snapdrive in guest iSCSI presentation to the exchange volumes. Every database and every log file needs its own individual volume. Basically, you would have to redesign your entire exchange environment and trust me, you will experience issues. The way you are doing it now is what we found to be the easiest, most reliable, and cleanest method of doing it. Veam B&R, & SRM.
the performance would be fine in SME because you are not snapping the guest as you are using the volume level snapshots but retention sucks. Its based on how much retention you allocate on the volume. Oh and also, have you dealt with NetApp support yet? Around SME, it is basically non-existent...very hit or miss and even when you do hit a good tech, there will be weeks/months worth of redundant troubleshooting. If this is a new device, you may want to talk to your sales rep and get a loaner unit to migrate to...at the end of the day, we wish we never bought any SATA drives with our NetApp. Had we gone all SAS shelves, we would have been much better off.

Also, I do not feel that NetApp themselves have a lot of blame here. We didn't realize what our actual workloads would be with everything we wanted on our SAN when all was said and done and "thought" that Exchange was supposed to be much more storage latency friendly(touted by Microsoft themselves). Our NetAPP has been good to us for the most part, but I think we just undersoized it from day one. We did not follow the rule of sizing....

Do not size off of space requirements, size off IOPS first, space second.
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

RE: I have no idea what that one is, I must look it up. (in regards to CPU CoStop)

Let me save you the trouble, do not even bother. You will end up going down the rode of oversized VMs where you will be advised to reduce the number of vCPUs in your environment, you will be told to move VMS around to balance out vCpus per host, and at the end of the day, the performance difference you see will be negligible. Move some high co-stop vms from your SATA aggregate to your SAS aggregate and you will see the CPU co-stop flatline at zero. We saw upwards of 4000ms CSTP times on SATA for months suddenly drop to zero once we moved the VM to SAS. The rabbit hold of CSTP was along one but ultimately the root cause was storage performance.


RE: It is a very recent install, 8.2.2 7-mode.(OnTap question)
8.2.2 P2? 7-mode.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi Jon,

RE: In order to do this, you would have to build a new Exchange VM and use snapdrive in guest iSCSI presentation to the exchange volumes. Every database and every log file needs its own individual volume.
I don't think I would agree with this. In my view, the way Veeam + NetAPP snapshots works it is something along this lines: it takes a guest snapshot as it would normally do (a few seconds or more for app aware), then a storage snapshot (5 secs tops), then is removing the VMware snapshot (quick and painless as the snapshot is young) and then mounts the LUN in proxy and grab the data from the VMFS snapshot. Since the snapshot of VMware is in the snapshot of the LUN in consistent state, all is nice. The storage will eventually be instructed to remove snapshot at the end and will eventually actually remove it (hours later in my tests).
Anyone from Veeam can confirm this?!

The SATA is for test and dev, I tried to keep away from SATA anything remotely production important.

RE: Oh and also, have you dealt with NetApp support yet? Around SME, it is basically non-existent...very hit or miss and even when you do hit a good tech, there will be weeks/months worth of redundant troubleshooting.
Hmm... this pretty much sounds like my experience with them so far, though I am not in the months since opening the ticket. getting there...

RE: 8.2.2 P2? 7-mode
I see no P2 in description, so it seems to be just 8.2.2

Regarding the sizing of our NetAPP: we provided to the vendor reports on iops, read/write ratio and so on and so forth. They knew what is supposed to replace (EMC CX3 series) and this is the solution they proposed. While I should be a better storage admin and do research & study, I am not really a storage admin, never really been one. Small shop, you know: the usual spread responsibilities across storage/exchange/network/virtualization and the list goes on. The guilt lies with me and with them in the same time in my view.

I personally doubt another shelf of SAS will solve anything. Controllers themselves are 80%-90% CPU usage on a single but massive storage vmotion to SAS I have made few days ago. Maybe migrating to 8000 series (maybe plus extra SAS shelf) is the solution but there is no budget. I am getting reassurance from NetAPP rep they look at the issue and will find a solution for us.

RE: We are about to go into a POC with a couple new arrays and I will keep you posted here, but I am willing to bet our snapshot" issue will be completely eliminated once we hit any of the new arrays.
Some of the new hybrid things are you trying? Tegile/Nimble or such? When the exchange was on Dell MD 3620 or EMC CX3, I have never had any issues with the snapshots removal. Not once in 1 year.

Thank you.

Anyone else with NetAPP + 10G iscsi + Veeam on this forum?
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

RE: I don't think I would agree with this. In my view, the way Veeam + NetAPP snapshots works it is something along this lines: it takes a guest snapshot as it would normally do (a few seconds or more for app aware), then a storage snapshot (5 secs tops), then is removing the VMware snapshot (quick and painless as the snapshot is young) and then mounts the LUN in proxy and grab the data from the VMFS snapshot. Since the snapshot of VMware is in the snapshot of the LUN in consistent state, all is nice. The storage will eventually be instructed to remove snapshot at the end and will eventually actually remove it (hours later in my tests).
Anyone from Veeam can confirm this?!

When it comes to SME, there are certain things you will see once you attempt to utilize it, that is what I was referring to. It was definitely not a friendly or easy experience for us and we tried it the way NetApp wanted. We had to rebuild our Exchange environment multiple times just to iron out the kinks and establish Best Practice for our environment. I suspect that once you attempt it, you will see what I am talking about here. Either way, I wish you the best of luck in using the NetApp without redesigning your current Exchange, while trying to have decent retention.

RE:I personally doubt another shelf of SAS will solve anything. Controllers themselves are 80%-90% CPU usage on a single but massive storage vmotion to SAS I have made few days ago. Maybe migrating to 8000 series (maybe plus extra SAS shelf) is the solution but there is no budget. I am getting reassurance from NetAPP rep they look at the issue and will find a solution for us.

how are you getting that CPU metric? There are a couple of ways to get the metric from your controllers. The one listed on the pre-work checklist when upgrading your OnTap controller software only tells you the highest CPU usage of 1 of the CPUs and not the average of the bunch. We ran into this a few months ago when bumping to 8.2.2 p2 7 mode. We were panicked because our CPU usage was well above 50% and the guide told us we can not do the upgrade with CPU utilization over 50. After much digging, we dropped to shell to get more accurate info. I will be happy to share that with you if you like.

RE: Regarding the sizing of our NetAPP: we provided to the vendor reports on iops, read/write ratio and so on and so forth. They knew what is supposed to replace (EMC CX3 series) and this is the solution they proposed. While I should be a better storage admin and do research & study, I am not really a storage admin, never really been one. Small shop, you know: the usual spread responsibilities across storage/exchange/network/virtualization and the list goes on. The guilt lies with me and with them in the same time in my view.

I hate that we send all the info requested to our vendors and they come back with something they claim is perfect for us, but then we run into these situations, where we are up the creek without a paddle with little to no recourse. I don't know of any company that has a dedicated storage admin. This is why we can never blame ourselves when we get an undersized unit. We send all the information they request and they have a team of experts that are supposed to size the environment properly. Most of the time, we either never get approval to get the recommended unit, or the recommended unit is sized incorrectly. We don't ever find out until it is too late.

RE: Some of the new hybrid things are you trying? Tegile/Nimble or such? When the exchange was on Dell MD 3620 or EMC CX3, I have never had any issues with the snapshots removal. Not once in 1 year.

We are looking at Nimble, Pure, and the Hyper-Converged solution Nutanix. We looked at many others along the way, but these are the ones we will probably POC. In fact, we still have to weed one of these out as we really do not want to POC 3 units. Each has their own qualities we love. Pure is probably the fastest but also the most expensive(we are nervous they will be acquired by someone soon though). Nimble has great speeds(not as fast as Pure) but more cost effective and great bang for the buck. Nutanix forces us to buy compute/memory and storage and basically walk away from our UCS environment eventually but may be the way things are going, plus we can just add nodes when we need more power, so its not a total forklift every 3 years like most other options. All 3 options have stellar support and will assist you with anything in your environment. I personally am leaning towards Nimble because of the great speeds and reviews, but others in my group are leaning towards Nutanix. Once we get whatever we choose into POC, we will surely see the difference between them as we are going to present the new array to vCenter and vmotion and Storage vMotion our exchange environment off the NetAPP, so we will have an apples to apples comparison to do a full test for at least a month or 2 before we make any decision to purchase.
NightBird
Expert
Posts: 242
Liked: 57 times
Joined: Apr 28, 2009 8:33 am
Location: Strasbourg, FRANCE
Contact:

Re: NetAPP backup performance

Post by NightBird »

Just a little question, if you were happy with your CX3-20, why did you choose netapp instead of a EMC VNX box ?
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Why?
1. my genuine stupidity I believe.
2. actually I have asked the solution provider for a VNX quote. Twice. Somehow he even forgot to acknowledge my request and I let it go being under the impression it may not fit our budget. I though EMC is generally more expensive than others. Plus I was aiming for DeDupe and NetAPP seems to be better, or so some sites claim.

In the end, why: point 1 is probably 80% of the reason.

Daniel.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: NetAPP backup performance

Post by Delo123 »

Oh boy....
I would seriously get that Netapp sales guy over.... :(
If they are not able to do Dedupe the right way (inline) they shouldn't offer it at all for primary storage...

Alternativly get EMC in and let them buy your Netapp stuff and sell you something else with a good discount...
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Again, I am looking if anyone has similar stories.
I am trying to convince myself it must be something else in the environment at hand. So far only Jon has com forward with a similar story though not identical as his problems were SATA performance related.

Currently I would not bash NetAPP just yet. They come forward, involving some of their resources to troubleshoot what is happening. They assured me what I experience is not typical for their users and in the following days or weeks I will be troubleshooting this with their support. It may be something in the ESXi environment or who knows what else. so far no one was able to find the smoking gun and I have exhausted all my 'google engineer' knowledge.

If this will end up in just 'take it as it is' attitude / there is nothing that can be done, I will be super mad but for the time being I am still hopeful.

My general feeling is that I should have gone for 1 extra SAS shelf and for the FAS 8000 series... but that ship has sailed, no budget now.

Thank you,
Daniel.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: NetAPP backup performance

Post by Delo123 »

Hi Daniel,

Please do keep us updated...
Hope you will find a solution
Joshue
Lurker
Posts: 1
Liked: never
Joined: Dec 18, 2012 5:47 pm
Full Name: Joshue Martin
Contact:

Re: NetAPP backup performance

Post by Joshue »

Hi.

We are in the process of configuring a FAS2552 (20x900GB SAS, 4x200GB SSD) with 10Gb NFS (we were told that the performarce would be better than with iSCSI), Clustered Data ONTAP 8.3 and Dedup, for our Vmware 5.1

I'm a little scared..., previously we were using Nexenta with iSCSI+SATA+SSD and the performance (Vmware and Veeam) was sufficient for our needs.
jveerd1
Service Provider
Posts: 52
Liked: 10 times
Joined: Mar 12, 2013 9:12 am
Full Name: Joeri van Eerd
Contact:

Re: NetAPP backup performance

Post by jveerd1 »

We are troubleshooting a NetApp 3250 8.2 cDOT backup performance issue with 10G. We have Nexus 5000 switches and some C7000 enclosures. Storage integration is enabled on the backup jobs. Veeam throttling kicks in as soon as the jobs start, because latency increases on the NetApp.
Expected backup throughput is 200+ MB/s, but we rarely see half.
Please keep this thread updated with your findings.
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

Can you please check vSphere for your disk read & write latency while the backup jobs are running? Specifically during snapshot creation and removal? From what I understand, anything over 15ms on either is very bad.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi everyone,

Sorry for missing in the thread, I'm on vacation.
No real updates so far, we wait for a meeting to figure out next course of action.

Joshue: I have seen no real difference with NFS versus iSCSI, we tested to see if the bottleneck may be iSCSI initiator. Yes, iSCSI is another layer over file system, NFS is a more direct access to the file system, so NFS should be at least as fast if not remotely faster. Depending on your needs and how you configure your aggregates, it may be enough for you. I don't think performance wise C mode versus 7 mode is any better.
I would be curious of your experience with it. Your setup seem to be smaller then mine but fairly similar. Please come back with your findings.
What are your IOPs you require and read/write ratio? Any idea?

I will check the latency next time backups kick in. From memory I remember seeing latency easily going to double digits during intensive IO and the CPU on FAS going to 90%-100%.
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 »

jveerd1 wrote:We are troubleshooting a NetApp 3250 8.2 cDOT backup performance issue with 10G. We have Nexus 5000 switches and some C7000 enclosures. Storage integration is enabled on the backup jobs. Veeam throttling kicks in as soon as the jobs start, because latency increases on the NetApp.
Expected backup throughput is 200+ MB/s, but we rarely see half.
Please keep this thread updated with your findings.
The only time we see over 100MB/s on our NetAPP is on a few small incremental. The fulls usually range betwwen 35MB/s and 70MB/s and we are full 10GBE everywhere....we are just waiting to get a POC from Nimble & Pure at this point and will see how they stack up(which I bet a years salary will be twice as fast at least) in terms of backup speed, and overall Exchange & SQL performance. Good luck with your meeting today Daniel, hopefully they find something mis-configured and you won't have to dive deep into a 58756325679 mile long rabbit hole just to find out your NetApp was undersized from the get go and your only option is a total forklife.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: NetAPP backup performance

Post by Delo123 » 1 person likes this post

JB, please keep us updated on Nimble / Pure. I guess everybody here is interessted in Backup / Restore performance on those babies... :)
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 » 1 person likes this post

Delo123 wrote:JB, please keep us updated on Nimble / Pure. I guess everybody here is interessted in Backup / Restore performance on those babies... :)
Oh I definitely will. I can't wait to get our POC devices in house to test them out. The key will be to test them under normal load we see every day. I will be sure to post the results as we have a full history over the past year of our servers performance.
nbctcp
Lurker
Posts: 2
Liked: never
Joined: Jul 30, 2010 3:49 am
Full Name: Nawir
Contact:

Re: NetAPP backup performance

Post by nbctcp »

INFO:
-NetApp FAS3240 SAS 72 disks
-UCS B 96GB RAM
-Nexus 5K+2K
-Citrix XenServer 6.2
-Citrix XenDesktop 7.1

PROBLEM: slow performance of my XenDesktop+XenApp
App icon become generic after sometime, so I can't launch app in RDS

VM boot improved from 20s to 12s after change vm format in NetApp from XenServer to Linux
Because I use SAN, so I didn't enable jumbo frame in Citrix PVS and MS RDS

After reading this thread, I start to suspect NetApp is the source of the problem.
But I am no longer working to my previous company since May last year.
I don't know the status now

My tech lead and boss following FlexPod 100%.
After test myself, IMO FlexPod only accurate in Cisco part not the rest

QUESTIONS:
1. How do you pinpoint NetApp is the culprit.
You mention CPU CoStop. Is that NetApp command?
Do you run iperf in Windows server? If yes, how fast IO do you expect?

tq
jb1095
Enthusiast
Posts: 35
Liked: 10 times
Joined: Mar 03, 2015 9:32 pm
Full Name: Jon Brite
Contact:

Re: NetAPP backup performance

Post by jb1095 » 1 person likes this post

nbctcp wrote:INFO:
-NetApp FAS3240 SAS 72 disks
-UCS B 96GB RAM
-Nexus 5K+2K
-Citrix XenServer 6.2
-Citrix XenDesktop 7.1

PROBLEM: slow performance of my XenDesktop+XenApp
App icon become generic after sometime, so I can't launch app in RDS

VM boot improved from 20s to 12s after change vm format in NetApp from XenServer to Linux
Because I use SAN, so I didn't enable jumbo frame in Citrix PVS and MS RDS

After reading this thread, I start to suspect NetApp is the source of the problem.
But I am no longer working to my previous company since May last year.
I don't know the status now

My tech lead and boss following FlexPod 100%.
After test myself, IMO FlexPod only accurate in Cisco part not the rest

QUESTIONS:
1. How do you pinpoint NetApp is the culprit.
You mention CPU CoStop. Is that NetApp command?
Do you run iperf in Windows server? If yes, how fast IO do you expect?

tq

I am so sorry but I did not see an email claiming there was a reply to this post. CPU CoStop is checked in your VMware environment. You can do it via command line, or in the VSphere client. You can also check for disk read/write latency from the same page(just click advanced and change data sets).

Update on the SAN shopping situation:

After getting all the quotes in from NetApp, Pure, and Nimble, we were so impressed with Nimbles quote that we decided to go with a bigger controller. Instead of the CS300(30k IOPS), we decided to go with the CS500(90k IOPS) instead. This unit has 2.4TB flash and 36TB raw storage.

We also were able to reach out to a few existing Nimble customers(Not recommended by our Nimble Sales team). We found them via various forums and reached out through private messages to set up calls. We got some very good information. I will share one of their stories.

An IT head at a company in Boston, MA was a NetApp shop for 6-7 years and started SAN shopping last year. He ended up doing a POC on 2 devices. The first was an All Flash NetApp array. The second was a Nimble CS400. He stated that the Nimble CS400 was literally 4 times faster than the All Flash Netapp array and the decision to go with Nimble over Netapp became a no brainer...especially as the cost was significantly cheaper. He ultimately purchased the CS500 which has been in production with zero issues since late last summer.

We also asked about having multiple plugins into VSphere seeing how we are adding to our environment and not removing the NetApp at this time and he was doing the same thing without issue. This something the Nimble team said "should not" be a problem but that they were not sure and would get back to us. We will be starting our POC the first week in May. We will also be doing a local replication to our second CS500 and once done, will ship it to our London office.

I will keep you all posted on our progress as soon as the POC starts. My goal is to put so much info on here, that anyone looking for info will get their fill. I will also be providing real time data benchmark results from our production environment for everything(VM boot time. SQL, Exchange, disk read/write latency, CPU Co-Stop, etc...) from both our current NetApp and our new Nimble CS500. If any of you want more info, please feel free to PM me.
daniel.negru
Influencer
Posts: 17
Liked: 1 time
Joined: Feb 13, 2013 5:36 pm
Full Name: Daniel Negru
Contact:

Re: NetAPP backup performance

Post by daniel.negru »

Hi everyone,

So far the news for me: it is obvious the CPUs on 25xx controllers are the bottleneck, we are asked to fork some more $ for 8000 series controllers, we are in the process of begging for budget and/or cut into other budget items to make this one fly, it will take about a month I believe.

The solution was supposed to give 10k iops easily and it seems that depending of the nature of IOPs, it crumbles under only 2 to 3 k iops/controller with high latency.
Since no one can explain why that is, I am reluctantly optimistic of the upgrade outcome.
Post Reply

Who is online

Users browsing this forum: BaptisteT, Bing [Bot], ekulaga and 87 guests