Host-based backup of VMware vSphere VMs.
Post Reply
labsy
Influencer
Posts: 14
Liked: never
Joined: Mar 18, 2013 7:30 pm
Full Name: Andrej Pirman
Contact:

Disk latency alerts and SCSI aborts during Backup

Post by labsy »

Hi,

I have a small setup:
- ESX 6.7 with a dozen of VMs on one location
- My home PC and NAS disk array with Veeam B&R andd VeeamOne on another location
- inbetween there's 500/500 Mbps internet connection

It's been happening for more than a year, every week's full backup generates dozens of alerts via e-mail:

Code: Select all

Alarm - Host disk SCSI aborts (state: Error)
Alarm - Host disk SCSI aborts (state: Reset/resolved)
Alarm - Datastore write latency (state: Warning)
Alarm - Datastore write latency (state: Reset/resolved)
Alarm - VM total disk latency (state: Warning)
Alarm - VM total disk latency (state: Reset/resolved)
I never found exactly what's wrong. However, I can see some errors on ESX host, but diagnosing is limited. Looks like all RAID arrays would have problems at that time:
vmkwarning.log

Code: Select all

WARNING: SVM: 5761: scsi0:1 VMX took 2283 msecs to send copy bitmap for offset 1260572901376. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: SVM: 5761: scsi0:1 VMX took 1352 msecs to send copy bitmap for offset 1282047737856. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: SVM: 5761: scsi0:1 VMX took 1009 msecs to send copy bitmap for offset 1288490188800. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: SVM: 5761: scsi0:1 VMX took 1882 msecs to send copy bitmap for offset 1297080123392. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: SVM: 5761: scsi0:1 VMX took 1234 msecs to send copy bitmap for offset 1301375090688. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: SVM: 5761: scsi0:1 VMX took 1439 msecs to send copy bitmap for offset 1324997410816. This is greater than expected latency. If this is a vvol disk, check with array latency.
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf02094118fc22c20b0" state in doubt; requested fast path state update...
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf020941096b3552071" state in doubt; requested fast path state update...
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf020941246cd1b61f9" state in doubt; requested fast path state update...
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf0209412edd702b94e" state in doubt; requested fast path state update...
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf0209412a6d2c6875b" state in doubt; requested fast path state update...
WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60030057027cabf020941246cd1b61f9" state in doubt; requested fast path state update...
...and vmkernel.log, from time to time...but none of those is mapped Backup NAS drive:

Code: Select all

ScsiDeviceIO: 3435: Cmd(0x459b4efbf7c0) 0x85, CmdSN 0x87238 from world 2099828 to dev "naa.60030057027cabf020941246cd1b61f9" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
ScsiDeviceIO: 3435: Cmd(0x459b4efbf7c0) 0x1a, CmdSN 0xb35ed3 from world 0 to dev "naa.60030057027cabf0209412a6d2c6875b" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Since this is enthusiastic setup, I cannot afford some paid service engineer, so asking for a clue. Maybe something wrong with Veeam B&R config on my home PC for this "over-the-WAN" setup? I would suspect 1 DISK, but errors or warnings do not point to single one, but rather to all of them. So, is maybe RAID Controller faulty? It should get at least some error on RAID Ctrl, but I can't find any.
Someone kick me in the right direction. Thanx!
karsten123
Service Provider
Posts: 381
Liked: 88 times
Joined: Apr 03, 2019 6:53 am
Full Name: Karsten Meja
Contact:

Re: Disk latency alerts and SCSI aborts during Backup

Post by karsten123 » 1 person likes this post

if iSCSI connection is over WAN, please change that.
Mildur
Product Manager
Posts: 8856
Liked: 2337 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Disk latency alerts and SCSI aborts during Backup

Post by Mildur » 1 person likes this post

Hi Andrej

That sounds like a technical issue. We cannot investigate such issues through a forum topic.
Please provide a support case ID for this issue, as requested when you click New Topic. Without case number, the topic will eventually be deleted by moderators.

Unfortunately we cannot investigate log files over a forum post. But one thing I would like to ask, did you deploy a Veeam VmWare proxy on the ESXi host? The proxy should be in the same side as the ESXi infrastructure.


Best regards,
Fabian

PS: support can only help if you upload logs https://www.veeam.com/kb1832
Product Management Analyst @ Veeam Software
Andreas Neufert
VP, Product Management
Posts: 6774
Liked: 1419 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Disk latency alerts and SCSI aborts during Backup

Post by Andreas Neufert » 1 person likes this post

I think this is just the way the infrastructure is designed and components selected.

In general, when you add additional IO or throughput load to a disk controller, you hit a point where the controller or the disks can not keep up with the demand, and therefore, latency will rise significantly.
As backup transport a lot of data, this can happen if you overload the controller with it.
You can work with the Veeam task slots on the Proxy to reduce the number of parallel reads needed for backup. This might help to avoid the situation.
labsy
Influencer
Posts: 14
Liked: never
Joined: Mar 18, 2013 7:30 pm
Full Name: Andrej Pirman
Contact:

Re: Disk latency alerts and SCSI aborts during Backup

Post by labsy »

Hi all!

Thank you very much for your responses!

@karsten123: No, iSCSI is not over WAN. NAS is connected to my home PC on same LAN and mapped as SMB network share to my PC.

@Mildur: Understood! But before I raise (probably paid) ticket, I will try the hint you provided - I did NOT deploy Veeam VmWare Proxy on ESX host, as I did not have that knowledge. But will take a look into that, maybe this is what leads all comm over WAN and slows things down.

@Andreas Neufert: Yes, logical. Beside, I did not provide ESX Proxy on host side, and possibly 4 parallel tasks over WAN are too much. Thanks for the hint!
Origin 2000
Service Provider
Posts: 86
Liked: 22 times
Joined: Sep 24, 2020 2:14 pm
Contact:

Re: Disk latency alerts and SCSI aborts during Backup

Post by Origin 2000 » 1 person likes this post

To be honest.. if you run ESXi 6.7 your hardware might be 10 years old or so and the question is why it starts stalling when VM runs on a snapshot or deleting the snaps. It looks like your Disks arent fast enough. Whats the status of the hardware (bad blocks and error counting of the disk and battery of the RAID controller)? Is writethough active?

Please post a esxtop where we can see the device latency during backup.

Regards,
Joerg
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 27 guests