Nimble Integration Issues

Availability for the Always-On Enterprise

Nimble Integration Issues

Veeam Logoby paul.hugill » Thu Jul 20, 2017 5:42 pm

Hi Everyone,
I am interested to hear if anyone has had any issues with their Nimble Integration at all.
The issue that I am having is that once a week (or so) one of my Veeam servers will lose it's storage.

In general it seems to work really well and I love how quick the VM snapshots get removed and the speed is much better.

We are using a bunch of Dell physical Windows servers (R720xd and NX3200) which provide both proxy and repository functions for Veeam.
At some point the server will lose access to it's internal storage with a bunch of PERCSAS2 errors every 30 seconds in the Windows System Log:
Source: percsas2
EventID: 129
Description: Reset to device, \Device\RaidPort0, was issued.

Essentially this seems to hang all the storage on the server (DAS and iSCSI LUNs) and all the jobs that are using that server as a proxy will just get stuck.
The only way to recover is to reboot the affected server, which has the effect of terminating the running tasks (they won't gracefully terminate though).

I started off by checking into the server firmware, given the RaidPort0 errors in the logs and we have updated that all but it still happened on a server with the updated FW.
I then disabled the 'Backup from Storage Snapshots' 10 days ago, on the jobs that it was enabled for and have not had the issue since.
Although I can't claim this has definitely fixed it, it does point me towards the Nimble integration somehow.

My current thinking is that it could be something to do with also having Compellent LUNs also presented to the same servers for DirectSAN backups, so I am looking into that.
I do have a case open with Nimble but not opened one with Veeam yet, however I may do.

Anyone else seen anything similar?

Thanks
Paul
paul.hugill
Novice
 
Posts: 5
Liked: never
Joined: Sat Jun 20, 2015 6:30 pm
Full Name: Paul Hugill

Re: Nimble Integration Issues

Veeam Logoby EugeneK » Thu Jul 20, 2017 6:03 pm

Hi Paul,

I haven't experienced it, not yet anyways.
Considering the whole system gets halted, I'm not sure it is due to any particular storage integration, but PERCSAS2 generally imply driver issues with the storage. That said, I'd check if the Nimble integration package was the latest available for your OS and compatible with the firmare version used on Nimble.
The mass hangout on storage may be the result of malfunctioning networking connection, too, where all outstanding I/O has to be queued up and that leads to additional resources consumption and ultimate crash of the services. Might be just another area to check.
Eugene K
Product Architect @ SingleHop - Veeam Platinum Service Provider
http://www.singlehop.com
VCAP-DCD, VCAP-DCA, VCP-NV
Veeam Certified Architect
EugeneK
Veeam Vanguard
 
Posts: 102
Liked: 23 times
Joined: Sat Mar 19, 2016 10:57 pm
Location: Chicago, IL
Full Name: Eugene Kashperovetskyi

Re: Nimble Integration Issues

Veeam Logoby garrett.stevens » Fri Sep 01, 2017 1:27 pm

Hey Paul, I'm seeing this in my environment. Did you happen to find a resolution?
garrett.stevens
Lurker
 
Posts: 2
Liked: never
Joined: Mon Mar 27, 2017 9:20 pm
Full Name: Garrett S.

Re: Nimble Integration Issues

Veeam Logoby foggy » Mon Sep 04, 2017 4:50 pm

Hi Garrett, have you contacted Veeam and/or Nimble support already?
foggy
Veeam Software
 
Posts: 15087
Liked: 1110 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Nimble Integration Issues

Veeam Logoby paul.hugill » Mon Sep 11, 2017 10:52 am

Sorry Eugene, had meant to reply earlier, I think you are correct about outstanding IO.

Garrett, whilst we don't have a confirmed solution, I think we know what is causing it for us, it seems to be due to how the Nimble handles MPIO.
Since disabling the 'Backup from Storage Snapshots' option for the jobs we have not had the issue but I can't get the solution in place just yet to test it.

In our environment, on the proxy servers, we have 2 NIC's in a team and whilst the Compellent SAN handles that fine and is a supported configuration, on a Nimble it uses Least Queue Depth for the path policy which is apparently the cause due to responses not being received on the expected paths.
The solution therefore, is to break the team so that MPIO will act on the independant ports.

Because this team is also currently used for another VLAN I can't get it split at the moment but I was already in the process of spec'ing up some new proxy servers with enough NIC's to be able to have the iSCSI ports independant anyway.
So for now, I am in a holding pattern until I can get the new infrastructure in place and start testing again, then I should be able to confirm.

If you have teamed NIC's and can split it, I would be interested if that is actually the fix.

Paul
paul.hugill
Novice
 
Posts: 5
Liked: never
Joined: Sat Jun 20, 2015 6:30 pm
Full Name: Paul Hugill

Re: Nimble Integration Issues

Veeam Logoby Robvil » Wed Sep 13, 2017 6:03 am

Is nic teaming even supported on Windows with MPIO?
Robvil
Enthusiast
 
Posts: 59
Liked: 5 times
Joined: Mon Oct 03, 2016 12:41 pm
Full Name: Robert

Re: Nimble Integration Issues

Veeam Logoby paul.hugill » Wed Sep 13, 2017 7:54 am

According to this, it is supported in a shared NIC scenario using MS Teaming, which we have.
https://blogs.technet.microsoft.com/ask ... -question/

So in theory, yes but dependant on the storage vendors implementation.
For example with a Compellent it is fine but the pathing policy for Nimble causes a problem.

I'm not saying Nimble are at fault for not supporting this, it is not an ideal set up but at the moment with our current servers it was the only way we could do it.

For now it just means I can't use the Nimble integration until our new infrastructure is in place.

To be honest, until I can even test the dedicated NIC configuration, I can't even say for sure that that is the problem, it could be something different.
I just know that something to do with the integration is causing server storage to lock up and that was the conclusion Nimble and I have come to so far.

Paul
paul.hugill
Novice
 
Posts: 5
Liked: never
Joined: Sat Jun 20, 2015 6:30 pm
Full Name: Paul Hugill


Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], jmmarton, mdanzheev, Stoo, Yahoo [Bot] and 65 guests