Nimble Integration Issues

paul.hugill · Post by **paul.hugill** » Jul 20, 2017 5:42 pm this post

Hi Everyone,
I am interested to hear if anyone has had any issues with their Nimble Integration at all.
The issue that I am having is that once a week (or so) one of my Veeam servers will lose it's storage.

In general it seems to work really well and I love how quick the VM snapshots get removed and the speed is much better.

We are using a bunch of Dell physical Windows servers (R720xd and NX3200) which provide both proxy and repository functions for Veeam.
At some point the server will lose access to it's internal storage with a bunch of PERCSAS2 errors every 30 seconds in the Windows System Log:
Source: percsas2
EventID: 129
Description: Reset to device, \Device\RaidPort0, was issued.

Essentially this seems to hang all the storage on the server (DAS and iSCSI LUNs) and all the jobs that are using that server as a proxy will just get stuck.
The only way to recover is to reboot the affected server, which has the effect of terminating the running tasks (they won't gracefully terminate though).

I started off by checking into the server firmware, given the RaidPort0 errors in the logs and we have updated that all but it still happened on a server with the updated FW.
I then disabled the 'Backup from Storage Snapshots' 10 days ago, on the jobs that it was enabled for and have not had the issue since.
Although I can't claim this has definitely fixed it, it does point me towards the Nimble integration somehow.

My current thinking is that it could be something to do with also having Compellent LUNs also presented to the same servers for DirectSAN backups, so I am looking into that.
I do have a case open with Nimble but not opened one with Veeam yet, however I may do.

Anyone else seen anything similar?

Thanks
Paul

Post by **EugeneK** » Jul 20, 2017 6:03 pm this post

Hi Paul,

I haven't experienced it, not yet anyways.
Considering the whole system gets halted, I'm not sure it is due to any particular storage integration, but PERCSAS2 generally imply driver issues with the storage. That said, I'd check if the Nimble integration package was the latest available for your OS and compatible with the firmare version used on Nimble.
The mass hangout on storage may be the result of malfunctioning networking connection, too, where all outstanding I/O has to be queued up and that leads to additional resources consumption and ultimate crash of the services. Might be just another area to check.

garrett.stevens · Sep 01, 2017 1:27 pm

Hey Paul, I'm seeing this in my environment. Did you happen to find a resolution?

Post by **foggy** » Sep 04, 2017 4:50 pm this post

Hi Garrett, have you contacted Veeam and/or Nimble support already?

paul.hugill · Post by **paul.hugill** » Sep 11, 2017 10:52 am this post

Sorry Eugene, had meant to reply earlier, I think you are correct about outstanding IO.

Garrett, whilst we don't have a confirmed solution, I think we know what is causing it for us, it seems to be due to how the Nimble handles MPIO.
Since disabling the 'Backup from Storage Snapshots' option for the jobs we have not had the issue but I can't get the solution in place just yet to test it.

In our environment, on the proxy servers, we have 2 NIC's in a team and whilst the Compellent SAN handles that fine and is a supported configuration, on a Nimble it uses Least Queue Depth for the path policy which is apparently the cause due to responses not being received on the expected paths.
The solution therefore, is to break the team so that MPIO will act on the independant ports.

Because this team is also currently used for another VLAN I can't get it split at the moment but I was already in the process of spec'ing up some new proxy servers with enough NIC's to be able to have the iSCSI ports independant anyway.
So for now, I am in a holding pattern until I can get the new infrastructure in place and start testing again, then I should be able to confirm.

If you have teamed NIC's and can split it, I would be interested if that is actually the fix.

Paul

Robvil · Post by **Robvil** » Sep 13, 2017 6:03 am this post

Is nic teaming even supported on Windows with MPIO?

paul.hugill · Post by **paul.hugill** » Sep 13, 2017 7:54 am this post

According to this, it is supported in a shared NIC scenario using MS Teaming, which we have.
https://blogs.technet.microsoft.com/ask ... -question/

So in theory, yes but dependant on the storage vendors implementation.
For example with a Compellent it is fine but the pathing policy for Nimble causes a problem.

I'm not saying Nimble are at fault for not supporting this, it is not an ideal set up but at the moment with our current servers it was the only way we could do it.

For now it just means I can't use the Nimble integration until our new infrastructure is in place.

To be honest, until I can even test the dedicated NIC configuration, I can't even say for sure that that is the problem, it could be something different.
I just know that something to do with the integration is causing server storage to lock up and that was the conclusion Nimble and I have come to so far.

Paul

R&D Forums

Nimble Integration Issues

Re: Nimble Integration Issues

Re: Nimble Integration Issues

Re: Nimble Integration Issues

Re: Nimble Integration Issues

Re: Nimble Integration Issues

Re: Nimble Integration Issues

Who is online