kte wrote:try nfs shares in stead if iscsi, disable jumbo frames everywhere, enable flow control on you're network nas dedicated switches, which should have enough buffers to not drop the packets when enabling flow control
don't use more the 50% of the system capacity, disable dedup,....
The problem with NFS for us is that it is not technically supported in Exchange. While that may be a political thing, the fact is Microsoft themselves say they don't support it because all kinds of weird things can happen(I have seen it run fine in NFS also for the record). In our environment, we are utilizing Cisco UCS blades, Nexus switches, all 10G, we followed best practice every step of the way...the bottleneck is the SAN, plain and simple. We had to create a passive exchange server just for backup purposes(also best practice), just so we could snap exchange...We ran into the 20 second stun problem that many users face, we also ran into an issue where our passive Exchange VM would go offline completely when removing the snap. Our solution to that was to move it to a SAS aggregate.
I do not mean to hijack this thread at all kte, but I am curious as to why you are suggesting that he disable Jumbo frames. Everything I have read regarding our environment is best practice, and I assure you, jumbo frames is big for us, as is dedupe. I am just not sure I would want to disable some of the most useful functions(that we also paid heavily for). As we are a FlexPod shop, with all best practice methods followed, I would have never thought we would see the issues we have seen over the last couple years but I do however agree that he should not be using more than 50% of his system capacity which brings me to my next point.
I think our biggest issue is that we probably undersized our storage when we first built it a couple years ago, so now we are in an environment where we need more storage, which just by normal rules will be faster regardless, so I can not blame the Netapp entirely. I will also say that even when we first got our Netapp in house, the performance was iffy. The speeds I used to see out of our EMC VNX5300 and prior to that our Clarion CX3-10 were both much faster than our Netapp ever was.
Daniel, if possible, can you check your vcenter and look at your disk read/write latency. Also, check out your CPU CoStop. I am guessing you will find high latency on many of your busiest VMs, and if you do a continuous ping to your machines being backed up, you will probably notice that they lose packets while the snapshot is being removed. Try to move anything that is deemed critical to your SAS aggregate, and like KTE said, make sure that you are not exceeding 50% of your resources on the Netapp. Also, what OnTap version are you on?