Discussions specific to the VMware vSphere hypervisor
Post Reply
bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

We currently have a NetApp SnapCenter server setup and configured to do application consistent snapshots for many of our SQL servers throughout the day. However we'd like to transition these jobs into Veeam to run periodic application consistent storage snapshots. So this means a job that runs periodically (some set for every hour, some set for every 4 hours) that only triggers a storage snapshot and not any Veeam backups. This was suggested by quite a few Veeam reps and even a NetApp rep since if Veeam is able to create the storage snapshot itself it does not have to go through the mounting process to scan snapshots as it currently does every time SnapCenter creates one.

However one issue we're running into is that for some reason when Veeam does the same thing the VSS freeze on the SQL server is too long to the point that we can't run them during production hours like we can with the SnapCenter jobs. Currently the SnapCenter jobs will trigger the VSS freeze, create the storage snapshot, and then unfreeze and that takes anywhere from 1-3 seconds. However when the same thing is done in Veeam with a job setting the "ONTAP Snapshot (Primary Storage Snapshot Only)" setting we are looking at freeze times between 15-40 seconds.

Is this expected behavior? Does anyone else also have a NetApp with Veeam storage snapshot jobs that run during production hours without this long of a freeze? Right now the freeze is too long and many of our applications time out or throw errors due to this and we've had to disable the jobs.

A college has a case open (04716753) but so far we haven't been able to figure out what is causing the long freeze or if it is expected.

foggy
Veeam Software
Posts: 20034
Liked: 1869 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by foggy »

Hi Randall, in the general case, snapshot-only job creates a VMware snapshot after quiescing the VM. However, in case the volume doesn't contain disks of any other VMs from the same job, the VM can be processed without VMware snapshot. I suspect it takes longer due to VMware snapshot processing - could you please check if this is the case?

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

So we did some testing yesterday and confirmed that the job was creating VMware snapshots.

We went ahead and tried relocating a VM to it's own datastores and were able to obtain freeze-only mode without it doing the VMware snapshot. However the freeze on the system still lasted 20+ seconds even without the VMware snapshot being processed.

orb
Service Provider
Posts: 115
Liked: 20 times
Joined: Apr 01, 2016 5:36 pm
Full Name: Olivier
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by orb »

Hi,

We had something similar years ago with a customer and a very busy MS-SQL where NetApp Snapshot was involved with DirectNFS. We had some major stuns, timeouts and clients disconnection during our backup. The VSS was forcing a memory flush on the disk and created massive I/O.

We never went to the bottom of this. The system was running 24/7 intensely and classical dumps were enough for our customers.

Did you use the SQL Agent from NetApp as well with SnapCenter? It is not very clear to me.

Oli

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

So here's what the support rep said are the times in regard to our freeze-only jobs:

Code: Select all

It takes 12 seconds to freeze the vm, 18 seconds to take the storage snapshots then 5 seconds to complete the unfreeze. 
So the long time to freeze is fine, but 18 seconds for the storage snapshots and 5 seconds for the unfreeze doesn't sound right, especially when SnapCenter is able to do those same tasks in 1-3 seconds. So far the support rep just gave us the times so still waiting on why it is taking so long to do those steps.

orb
Service Provider
Posts: 115
Liked: 20 times
Joined: Apr 01, 2016 5:36 pm
Full Name: Olivier
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by orb »

bg

You can find a file with all steps/timing on the SQL server in %ProgramData%\Veeam for
What model do you have? How many and what type of disks do you have in your aggregate which supports your SQL volumes? Your NetApp may be very busy also.

Oli

Andreas Neufert
VP, Product Management
Posts: 5061
Liked: 1032 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by Andreas Neufert »

Hi Randall,

we are processing many additional things like metadata collection and setting restore awareness settings which SnapCenter does not.

During a normal VSS writer processing the application should not go down. Can you please describe what issues do you face?
Our support can give you a VSS snapshot tools where you can run native VSS processing (without our software in the mix) to veriffy that you do not have an issue with the native Microsoft VSS commands.

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

Well the problem is that the freeze is so long that applications start to time out. Most of our applications have a 15 second timeout window. Some of the freezes we're seeing on some SQL servers are in the 45+ second range.

And I know there are other things that Veeam does that SnapCenter does not, but I would assume that once you get to the point where the server is frozen, the only thing that needs to be done at that point is the storage snapshots and the unfreeze. Support has not been able to tell why it takes Veeam so long to trigger the snapshots against the NetApp.

We can try the VSS snapshot tools, but at this point looking at the timings that support has given us it doesn't seem to be a VSS issue and is more of a communications issue between Veeam and NetApp. But it appears we've escalated the case as high as we can go, and so far support has simply given us the times from the logs and have not provided us with any direction or possible solutions at all. If you look at the case notes the support reps have many times just sent back the Veeam logs to us to review.

foggy
Veeam Software
Posts: 20034
Liked: 1869 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by foggy »

According to the logs, 15 out of those 18 seconds takes the connection to the storage, actual snapshots are quite fast - I think this is the issue that should be investigated. Could you please also elaborate on the timeout value - 15 seconds looks quite short, I believe the default VSS writers timeout is 60 sec (20 sec for Exchange).

Andreas Neufert
VP, Product Management
Posts: 5061
Liked: 1032 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by Andreas Neufert »

Overall the application should not be affected from the VSS processing itself as well not from the storage snapshot processing.
VSS writers can slow down an application but it should not lead into any issues. SnapCenter consistency processing should take the same time in case of VSS creation.

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

foggy wrote: Apr 09, 2021 9:58 am According to the logs, 15 out of those 18 seconds takes the connection to the storage, actual snapshots are quite fast - I think this is the issue that should be investigated. Could you please also elaborate on the timeout value - 15 seconds looks quite short, I believe the default VSS writers timeout is 60 sec (20 sec for Exchange).
So that 15 second delay is definitely odd and we're going to start looking into it, thank you for pointing that out.

Not sure if it's related or not but I tested editing the NetApp SVM under the storage integration, going to credentials, and pressing next, and there was almost exactly a 15 second delay where it said "Checking connection..." before if flashed away and started "saving to storage configuration...". But interestingly enough if I cancel out the window and do the same thing again, it only takes a second or two now to do the "Checking connection..." part. But if I leave it for a few hours and come back it takes 15 seconds again. We'll see if we can get support to review the connection to see if something is causing the delay.

Regarding the timeout issue, here's one of the errors generated from one of our applications, but we've gotten similar errors from other applications when the freeze took too long:

Code: Select all

Commit failed with SQL exception
Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
The wait operation timed out

Andreas Neufert
VP, Product Management
Posts: 5061
Liked: 1032 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by Andreas Neufert »

Strange. Looks like you run into a 30 sec default timeout for SQL operations (incl. SQL VSS Writer release processing).

Do you have something like NLB-Cluster in use that use the same IP address in multiple servers?

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

I do not believe so. So far all of our testing has been with single node SQL servers; no SQL always-on or failover clustering involved as of yet.

mcz
Veeam Legend
Posts: 468
Liked: 74 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by mcz »

Not sure if it's related or not but I tested editing the NetApp SVM under the storage integration, going to credentials, and pressing next, and there was almost exactly a 15 second delay where it said "Checking connection..." before if flashed away and started "saving to storage configuration...". But interestingly enough if I cancel out the window and do the same thing again, it only takes a second or two now to do the "Checking connection..." part. But if I leave it for a few hours and come back it takes 15 seconds again.
guys, have you ever done a trace during that operation to see if there's maybe a delay from the storage side, packet loss or such stuff?

bg.ranken
Enthusiast
Posts: 75
Liked: 12 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: NetApp Application Consistent Storage Snapshots During Production Hours

Post by bg.ranken »

We actually just got off the phone with the support reps.

We have not done a trace yet. We actually brought it up to support but they said they do not need it yet.

They had us enable extra logging for the NetApp integration via reg keys from this KB: https://www.veeam.com/kb2409

After adding the key we tried to see if we could reproduce the delay in the console but were unable to produce the issue enough for it to stand out in the logs. We ended up running the job again and the delay in contacting the NetApp after the freeze was still there with no additional logging (at least at the job log level). The actual full snapshot process only takes 4-5 seconds once Veeam is finished connecting to the NetApp. We are sending them the logs again and they are going to be looking into it further to see if there's more logging for what's going on during the delay.

Code: Select all

[13.04.2021 12:07:28] <01> Info         [CAutoSnapshot] Finished VSS Freeze, freezed: 'True'
[13.04.2021 12:07:28] <01> Info         [NetApp] Connecting to NetApp server 'svm***'. SVM: 'svm***' API version '1.15'. User: '***'. Port: '443'. Protocol: 'HTTPS'.
[13.04.2021 12:07:43] <01> Info         [NetApp] Getting ONTAPI version.
[13.04.2021 12:07:43] <01> Info     Invoke:
[13.04.2021 12:07:43] <01> Info         <system-get-ontapi-version/>
[13.04.2021 12:07:43] <01> Info     Response:
They did recommend us some alternatives such as using native SQL backups to avoid the freezes, but for obvious reason we would prefer not to go down that path. Worse case we would change our storage snapshots that Veeam is doing to crash consistent or continue using SnapCenter for SQL which is still working.

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 19 guests