I/O hangs during Veeam Snapshot and NFS/openvpn fails

Backup agent for Linux servers and workstations on-premises or in the public cloud

I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Thu Aug 04, 2016 10:33 pm

I have a small private VPS in the cloud with low specs (1 CPU, 1GB RAM, 30GB disk) running CentOS 7 minimal and webserver packages.
I have setup openvpn server at home and added a disk to the server meant for backup storage and exported via nfs.
I have setup openvpn client on the VPS and configured a tunnel.

I have installed veeam agent for linux and the veeamsnap module. I have configured a job to use an NFS share located on the homeserver using the IP address of the openvpn interface tap0.
When I start the job after about 2 minutes I can no longer ping the cloudserver over the tunnel.
Even the logfile of openVPN set with a high verbosity freezes completely.
On a running SSH session I execute df -h and it hangs.
New SSH sessions hang waiting for username prompt.
/var/log/messages mentions NFS timeout to servers

Excerpt from openvpn log that shows it hanging for about 4 minutes and then I start to stop the job and kill veaam processes.
Code: Select all
Thu Aug  4 23:56:13 2016 us=247452 TLS: tls_pre_encrypt: key_id=0
Thu Aug  4 23:56:13 2016 us=247484 TCPv4_CLIENT WRITE [1429] to [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=1428
Thu Aug  4 23:56:13 2016 us=247513 TUN READ [1388]
Thu Aug  4 23:56:13 2016 us=247526 TLS: tls_pre_encrypt: key_id=0
Thu Aug  4 23:56:13 2016 us=247560 TCPv4_CLIENT WRITE [1429] to [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=1428
Thu Aug  4 23:56:13 2016 us=247598 TUN READ [1388]
Thu Aug  4 23:56:13 2016 us=247612 TLS: tls_pre_encrypt: key_id=0
Thu Aug  4 23:56:13 2016 us=247646 TCPv4_CLIENT WRITE [1429] to [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=1428
Thu Aug  4 23:56:13 2016 us=247676 TUN READ [1388]
Thu Aug  4 23:56:13 2016 us=247688 TLS: tls_pre_encrypt: key_id=0
Thu Aug  4 23:56:13 2016 us=247722 TCPv4_CLIENT WRITE [1429] to [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=1428
Thu Aug  4 23:56:13 2016 us=247752 TUN READ [1388]
Thu Aug  4 23:56:13 2016 us=247766 TLS: tls_pre_encrypt: key_id=0
Thu Aug  4 23:56:13 2016 us=247817 TCPv4_CLIENT WRITE [1429] to [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=1428
Fri Aug  5 00:00:25 2016 us=646563 TCPv4_CLIENT READ [93] from [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=92
Fri Aug  5 00:00:25 2016 us=646672 TLS: tls_pre_decrypt, key_id=0, IP=[AF_INET]x.x.x.x:1701
Fri Aug  5 00:00:25 2016 us=646759 PID_TEST [0] [SSL-0] [] 0:71264 0:71265 t=1470348025[0] r=[0,0,0,0,1]
Fri Aug  5 00:00:25 2016 us=646912 TUN WRITE [54]
Fri Aug  5 00:00:25 2016 us=647004 TCPv4_CLIENT READ [93] from [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=92
Fri Aug  5 00:00:25 2016 us=647022 TLS: tls_pre_decrypt, key_id=0, IP=[AF_INET]x.x.x.x:1701
Fri Aug  5 00:00:25 2016 us=647038 PID_TEST [0] [SSL-0] [] 0:71265 0:71266 t=1470348025[0] r=[0,0,0,0,1]
Fri Aug  5 00:00:25 2016 us=647051 TUN WRITE [54]
Fri Aug  5 00:00:25 2016 us=647094 TCPv4_CLIENT READ [93] from [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=92
Fri Aug  5 00:00:25 2016 us=647109 TLS: tls_pre_decrypt, key_id=0, IP=[AF_INET]x.x.x.x:1701
Fri Aug  5 00:00:25 2016 us=647124 PID_TEST [0] [SSL-0] [] 0:71266 0:71267 t=1470348025[0] r=[0,0,0,0,1]
Fri Aug  5 00:00:25 2016 us=647159 TUN WRITE [54]
Fri Aug  5 00:00:25 2016 us=647192 TCPv4_CLIENT READ [93] from [AF_INET]x.x.x.x:1701: P_DATA_V1 kid=0 DATA len=92
Fri Aug  5 00:00:25 2016 us=647223 TLS: tls_pre_decrypt, key_id=0, IP=[AF_INET]x.x.x.x:1701
Fri Aug  5 00:00:25 2016 us=647243 PID_TEST [0] [SSL-0] [] 0:71267 0:71268 t=1470348025[0] r=[0,0,0,0,1]
Fri Aug  5 00:00:25 2016 us=647256 TUN WRITE [54]

Excerpt from /var/log/messages
Code: Select all
Aug  4 23:54:12 www ntpd[495]: Listen normally on 47 tap0 172.17.17.2 UDP 123
Aug  4 23:54:12 www avahi-daemon[503]: Registering new address record for fe80::d011:84ff:fef7:897a on tap0.*.
Aug  4 23:54:14 www ntpd[495]: Listen normally on 48 tap0 fe80::d011:84ff:fef7:897a UDP 123
Aug  4 23:55:10 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_TRACKING_COLLECT
Aug  4 23:55:10 www kernel:    veeamsnap:ctrl_fops.c:ioctl_tracking_collect Collecting tracking device:
Aug  4 23:55:10 www kernel:    veeamsnap:tracking.c:tracking_collect dev_id=253:1
Aug  4 23:55:10 www kernel:    veeamsnap:tracking.c:tracking_collect snap_number=9
Aug  4 23:55:10 www kernel:    veeamsnap:tracking.c:tracking_collect cbt_map_size=122872
Aug  4 23:55:10 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_TRACKING_ADD
Aug  4 23:55:10 www kernel:    veeamsnap:tracking.c:tracking_add Adding. dev_id=253:1
Aug  4 23:55:10 www kernel:    veeamsnap:tracking.c:tracking_add Device already tracking. Device=253:1
Aug  4 23:55:10 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_SNAPSHOT_COMMON_RESERVE
Aug  4 23:55:10 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_common_reserve Reserved. id=0
Aug  4 23:55:14 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_DIRECT_DEVICE_OPEN
Aug  4 23:55:14 www kernel:    veeamsnap:ctrl_fops.c:ioctl_direct_device_open .
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_DIRECT_DEVICE_CLOSE
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ioctl_direct_device_close .
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_SNAPSHOT_COMMON_DATAINFO
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ioctl_snapshotdata_make_common_datainfo dev_id_host_data:253:1
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ioctl_snapshotdata_make_common_datainfo rangeset count=370
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_common_create common_file_id=0
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_common_create dev_id=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:__snapshotdata_create_blk_info blocks count =0x1a0ea0
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_common_create New snapshot common disk added to container.
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ioctl_snapshotdata_make_common_datainfo success.
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_SNAPSHOT_COMMON_ADD_DEV
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_common_add_dev  device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_SNAPSHOT_CREATE
Aug  4 23:55:18 www kernel:    veeamsnap:snapshot.c:snapshot_Create device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshot.c:_snapshot_add_tracker Adding. dev_id=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshot.c:_snapshot_add_tracker Device already under change tracking. Device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshot.c:_snapshot_add_data Snapshot data exist for device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:tracker.c:_tracker_freeze dev_id=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:defer_io.c:defer_io_create .
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_FindByDevId Snapshot data for device successfully found. dev_id=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:defer_io.c:defer_io_create Snapshot data using for device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshotdata.c:snapshotdata_Reset .
Aug  4 23:55:18 www kernel:    veeamsnap:defer_io.c:defer_io_create complete success.
Aug  4 23:55:18 www kernel:    veeamsnap:cbt_map.c:cbt_map_switch CBT map switch.
Aug  4 23:55:18 www kernel:    veeamsnap:tracker.c:_tracker_freeze Snapshot created. New snap number=11
Aug  4 23:55:18 www kernel:    veeamsnap:defer_io.c:defer_io_work_thread started.
Aug  4 23:55:18 www kernel:    veeamsnap:blk_dev_utile.c:blk_thaw_bdev thaw_bdev for device=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapshot.c:snapshot_Create snapshot_id=0xffff8800088e86c0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create original_dev=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create image_dev=252:0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create physical_block_size=512
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create logical_block_size=4096
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create device name=veeamimage0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_create capacity=32210157568
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_TRACKING_COLLECT
Aug  4 23:55:18 www kernel:    veeamsnap:ctrl_fops.c:ioctl_tracking_collect Collecting tracking device:
Aug  4 23:55:18 www kernel:    veeamsnap:tracking.c:tracking_collect dev_id=253:1
Aug  4 23:55:18 www kernel:    veeamsnap:tracking.c:tracking_collect snap_number=10
Aug  4 23:55:18 www kernel:    veeamsnap:tracking.c:tracking_collect cbt_map_size=122872
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:snapimage_processor_thread started.
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:__snapimage_open dev=252:0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:__snapimage_open dev=252:0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:__snapimage_ioctl dev=252:0
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:__snapimage_ioctl cmd=21297
Aug  4 23:55:18 www kernel:    veeamsnap:snapimage.c:__snapimage_ioctl arg=0x0
Aug  4 23:55:19 www kernel:    veeamsnap:snapimage.c:__snapimage_release dev id=252:0
Aug  4 23:55:19 www kernel:    veeamsnap:snapimage.c:__snapimage_release dev id=252:0
Aug  4 23:55:19 www systemd: Device dev-disk-by\x2dlabel-DOROOT.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:05.0/virtio2/block/vda/vda1 and /sys/devices/virtual/block/veeamimage0
Aug  4 23:55:19 www systemd: Device dev-disk-by\x2duuid-c5845b43\x2dfe98\x2d499a\x2dbf31\x2d4eccae14261b.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:05.0/virtio2/block/vda/vda1 and /sys/devices/virtual/block/veeamimage0
Aug  4 23:55:36 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_COLLECT_SNAPSHOT_IMAGES
Aug  4 23:55:36 www kernel:    veeamsnap:snapimage.c:snapimage_collect_images count=0
Aug  4 23:55:36 www kernel:    veeamsnap:snapimage.c:snapimage_collect_images real_count=1
Aug  4 23:55:36 www kernel:    veeamsnap:ctrl_fops.c:ctrl_unlocked_ioctl IOCTL_COLLECT_SNAPSHOT_IMAGES
Aug  4 23:55:36 www kernel:    veeamsnap:snapimage.c:snapimage_collect_images count=1
Aug  4 23:55:36 www kernel:    veeamsnap:snapimage.c:snapimage_collect_images real_count=1
Aug  4 23:55:36 www kernel:    veeamsnap:snapimage.c:__snapimage_open dev=252:0
Aug  4 23:56:59 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:57:02 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:57:41 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:58:11 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:58:11 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:58:40 www kernel: INFO: task veeamagent:12339 blocked for more than 120 seconds.
Aug  4 23:58:40 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  4 23:58:40 www kernel: veeamagent      D ffff8800054fbb30     0 12339  12332 0x00000080
Aug  4 23:58:40 www kernel: ffff8800054fb9d0 0000000000000086 ffff880000e77300 ffff8800054fbfd8
Aug  4 23:58:40 www kernel: ffff8800054fbfd8 ffff8800054fbfd8 ffff880000e77300 ffff88003ec147c0
Aug  4 23:58:40 www kernel: 0000000000000000 7fffffffffffffff ffffffffa0534bb0 ffff8800054fbb30
Aug  4 23:58:40 www kernel: Call Trace:
Aug  4 23:58:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff8163b809>] schedule+0x29/0x70
Aug  4 23:58:40 www kernel: [<ffffffff816394f9>] schedule_timeout+0x209/0x2d0
Aug  4 23:58:40 www kernel: [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
Aug  4 23:58:40 www kernel: [<ffffffff812f85a5>] ? __radix_tree_preload+0x35/0xa0
Aug  4 23:58:40 www kernel: [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30
Aug  4 23:58:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff8163ae3e>] io_schedule_timeout+0xae/0x130
Aug  4 23:58:40 www kernel: [<ffffffff8163aed8>] io_schedule+0x18/0x20
Aug  4 23:58:40 www kernel: [<ffffffffa0534bbe>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff81639680>] __wait_on_bit+0x60/0x90
Aug  4 23:58:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff81639737>] out_of_line_wait_on_bit+0x87/0xb0
Aug  4 23:58:40 www kernel: [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
Aug  4 23:58:40 www kernel: [<ffffffffa0536583>] nfs_wait_on_request+0x33/0x40 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffffa053b151>] nfs_updatepage+0x121/0x8a0 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffffa052afa1>] nfs_write_end+0x121/0x350 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff8116a214>] generic_file_buffered_write+0x184/0x290
Aug  4 23:58:40 www kernel: [<ffffffff8116b6d5>] __generic_file_aio_write+0x1d5/0x3e0
Aug  4 23:58:40 www kernel: [<ffffffff811d4256>] ? mem_cgroup_update_page_stat+0x16/0x50
Aug  4 23:58:40 www kernel: [<ffffffff8116b93d>] generic_file_aio_write+0x5d/0xc0
Aug  4 23:58:40 www kernel: [<ffffffffa052a03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Aug  4 23:58:40 www kernel: [<ffffffff811de3fd>] do_sync_write+0x8d/0xd0
Aug  4 23:58:40 www kernel: [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
Aug  4 23:58:40 www kernel: [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
Aug  4 23:58:40 www kernel: [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
Aug  4 23:59:01 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:59:01 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:59:01 www kernel:    veeamsnap:snapimage.c:__snapimage_release dev id=252:0
Aug  4 23:59:35 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  4 23:59:35 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:00:01 www systemd: Created slice user-499.slice.
Aug  5 00:00:01 www systemd: Starting user-499.slice.
Aug  5 00:00:01 www systemd: Started Session 2452 of user munin.
Aug  5 00:00:01 www systemd: Starting Session 2452 of user munin.
Aug  5 00:00:01 www systemd: Started Session 2453 of user root.
Aug  5 00:00:01 www systemd: Starting Session 2453 of user root.
Aug  5 00:00:01 www systemd: Started Session 2454 of user root.
Aug  5 00:00:01 www systemd: Starting Session 2454 of user root.
Aug  5 00:00:07 www systemd: Removed slice user-499.slice.
Aug  5 00:00:07 www systemd: Stopping user-499.slice.
Aug  5 00:00:25 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:00:25 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:00:31 www avahi-daemon[503]: Withdrawing address record for fe80::d011:84ff:fef7:897a on tap0.
Aug  5 00:00:31 www avahi-daemon[503]: Withdrawing address record for 172.17.17.2 on tap0.
Aug  5 00:00:31 www avahi-daemon[503]: Withdrawing workstation service for tap0.
Aug  5 00:00:32 www ntpd[495]: Deleting interface #48 tap0, fe80::d011:84ff:fef7:897a#123, interface stats: received=0, sent=0, dropped=0, active_time=378 secs
Aug  5 00:00:32 www ntpd[495]: Deleting interface #47 tap0, 172.17.17.2#123, interface stats: received=0, sent=0, dropped=0, active_time=380 secs
Aug  5 00:00:40 www kernel: INFO: task veeamagent:12339 blocked for more than 120 seconds.
Aug  5 00:00:40 www kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  5 00:00:40 www kernel: veeamagent      D ffff8800054fbb30     0 12339  12332 0x00000080
Aug  5 00:00:40 www kernel: ffff8800054fb9d0 0000000000000086 ffff880000e77300 ffff8800054fbfd8
Aug  5 00:00:40 www kernel: ffff8800054fbfd8 ffff8800054fbfd8 ffff880000e77300 ffff88003ec147c0
Aug  5 00:00:40 www kernel: 0000000000000000 7fffffffffffffff ffffffffa0534bb0 ffff8800054fbb30
Aug  5 00:00:40 www kernel: Call Trace:
Aug  5 00:00:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff8163b809>] schedule+0x29/0x70
Aug  5 00:00:40 www kernel: [<ffffffff816394f9>] schedule_timeout+0x209/0x2d0
Aug  5 00:00:40 www kernel: [<ffffffff81173317>] ? __alloc_pages_nodemask+0x197/0xba0
Aug  5 00:00:40 www kernel: [<ffffffff812f85a5>] ? __radix_tree_preload+0x35/0xa0
Aug  5 00:00:40 www kernel: [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30
Aug  5 00:00:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff8163ae3e>] io_schedule_timeout+0xae/0x130
Aug  5 00:00:40 www kernel: [<ffffffff8163aed8>] io_schedule+0x18/0x20
Aug  5 00:00:40 www kernel: [<ffffffffa0534bbe>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff81639680>] __wait_on_bit+0x60/0x90
Aug  5 00:00:40 www kernel: [<ffffffffa0534bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff81639737>] out_of_line_wait_on_bit+0x87/0xb0
Aug  5 00:00:40 www kernel: [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
Aug  5 00:00:40 www kernel: [<ffffffffa0536583>] nfs_wait_on_request+0x33/0x40 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffffa053b151>] nfs_updatepage+0x121/0x8a0 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffffa052afa1>] nfs_write_end+0x121/0x350 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff8116a214>] generic_file_buffered_write+0x184/0x290
Aug  5 00:00:40 www kernel: [<ffffffff8116b6d5>] __generic_file_aio_write+0x1d5/0x3e0
Aug  5 00:00:40 www kernel: [<ffffffff811d4256>] ? mem_cgroup_update_page_stat+0x16/0x50
Aug  5 00:00:40 www kernel: [<ffffffff8116b93d>] generic_file_aio_write+0x5d/0xc0
Aug  5 00:00:40 www kernel: [<ffffffffa052a03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Aug  5 00:00:40 www kernel: [<ffffffff811de3fd>] do_sync_write+0x8d/0xd0
Aug  5 00:00:40 www kernel: [<ffffffff811dec1d>] vfs_write+0xbd/0x1e0
Aug  5 00:00:40 www kernel: [<ffffffff811df6bf>] SyS_write+0x7f/0xe0
Aug  5 00:00:40 www kernel: [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
Aug  5 00:01:01 www systemd: Started Session 2455 of user root.
Aug  5 00:01:01 www systemd: Starting Session 2455 of user root.
Aug  5 00:01:15 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:01:15 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:02:05 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:02:05 www kernel: nfs: server 172.17.17.1 not responding, timed out
Aug  5 00:02:40 www kernel: INFO: task veeamagent:12339 blocked for more than 120 seconds.
...

The moment I try killing veeamdeferio with -9 the system starts to respond a little again.
At that point I stop the Openvpn client because it's useless anyways.
Veeamjob just hangs. In top I see a process with the IP address of my homeserver, probably an NFS related process, that has state D. I also see VeeamAgent process with state Z.
Only after startting the VPN tunnel again 20 minutes later do the hung processes disappear and does the veeam job go to status failed.

So to sum up, Veeam AL configured for backup to NFS connected by OpenVPN tunnel. Veeam snapshot and disk backup freezes I/O, openvpnclient freezes and communication fails, NFS connection fails, Veeam hangs because of failed NFS communication. Only by trying to kill processes does the system become normally repsonsive again and after restarting the openVPN tunnel, processes come out of zombie state and Job fails.
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby vmniels » Fri Aug 05, 2016 8:30 am

I guess the server gets overloaded in amount of CPU usage related to this setup.

I notice you are also running munin, is this a basic munin config? As munin is quite a CPU & memory hogger, this combined with VAL asking for resources might just overload the server. Could you maybe give it some more memory and cpu power?

If possible please provide the full dmesg -T, syslog, /var/log/veeam logs in a zip.
VCP-DCV
Veeam Certified Engineer
http://foonet.be
vmniels
Veeam Software
 
Posts: 1440
Liked: 318 times
Joined: Mon Jul 15, 2013 11:09 am
Full Name: Niels Engelen

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Fri Aug 05, 2016 9:25 pm

It's just munin-node, the munin-server is running elsewhere.
The strange thing is that I was checking CPU usage with top and it was not extreme. Load went up to about 2 or 3.
I'll send you the logs in a private message, thanks for your offer!
Beefing up the VPS is costly so I probably wont do that. Maybe an option to throttle CPU in the next version is an option?
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby PTide » Mon Aug 08, 2016 8:49 am

Hi,

Currently all Veeam processes are started with 0 priority value, which can be changed via usage of renice command. We will look into lowering veeam's default priority value in the next beta.

P.S. The VPS is a KVM virtual machine, isn't it?

Thanks!
PTide
Veeam Software
 
Posts: 2884
Liked: 236 times
Joined: Tue May 19, 2015 1:46 pm

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Mon Aug 08, 2016 4:31 pm

I'll try it with renice. Which processes should I renice?

it is indeed a KVM vps.

Thanks
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby vmniels » Tue Aug 09, 2016 8:26 am

That would be "veeamagent" process. An option would be to limit it via cgroups however renice is a bit easier to use directly.
VCP-DCV
Veeam Certified Engineer
http://foonet.be
vmniels
Veeam Software
 
Posts: 1440
Liked: 318 times
Joined: Mon Jul 15, 2013 11:09 am
Full Name: Niels Engelen

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby PTide » Tue Aug 09, 2016 3:47 pm

Also please use iostat during backup session to collect some statistics about disk I/O from both servers (vps and your home server).

Thanks
PTide
Veeam Software
 
Posts: 2884
Liked: 236 times
Joined: Tue May 19, 2015 1:46 pm

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Wed Aug 10, 2016 9:24 pm

I'm giving up.
I even changed nice to +19 and still all disk I/O seems to stall.
iostat hangs, df -h hangs
pkill veeamagent hangs

https://dl.dropboxusercontent.com/u/372 ... tlinux.jpg
https://dl.dropboxusercontent.com/u/372 ... -hangs.jpg

I did manage to issue an iostat when I was busy killing the veeamagent processes before everything completely hangs.

Code: Select all
[root@www ~]# iostat
Linux 3.10.0-327.22.2.el7.x86_64 (www.xxxxxx)        08/10/2016      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.99    0.01    4.17    5.40    0.10   70.33

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda              73.06      1279.30       661.83    3387421    1752436


On the home server everything is fine:
Code: Select all
[root@ARxxxxxx]# iostat
Linux 3.10.0-327.13.1.el7.x86_64 (ARxxxxxx)       08/10/2016      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.01    1.61    0.31    0.28    0.00   95.79

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               8.86        11.49       105.68     963441    8864630
sdb               0.00         0.00         0.00        256          0
sdc               0.01         0.03         2.07       2617     173320
dm-0             12.42        11.29       105.29     946941    8831472
dm-1              0.14         0.16         0.39      13340      33124
dm-2              0.01         0.02         2.07       1385     173320

This is DigitalOcean KVM btw
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby PTide » Thu Aug 11, 2016 11:30 am

The log you've provided is missing one important parameter (%util).Please execute this
Code: Select all
iostat -xk -t 5 | awk '// {print strftime("%Y-%m-%d %H:%M:%S"),$0}' >> iostat.log &

The initial output should look be a 4 digit number (PID) followed by a digit in a square brackets (forked process’s job number). After that start backup again. When the backup job has finished/failed please execute
Code: Select all
kill -kill 4-digit-PID

You'll find iostat.log file in you working directory, please share it.

Thank you.
PTide
Veeam Software
 
Posts: 2884
Liked: 236 times
Joined: Tue May 19, 2015 1:46 pm

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Thu Aug 11, 2016 5:13 pm

The problem is that most of the times the server completely hangs and becomes unresponsive and I have to hard reset it.
I'll try again.
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Thu Aug 11, 2016 8:31 pm

So I tried again.
Result, server hangs completely, I can't even ctrl +c my ping command. So I can't kill anything either. This is with veeamagent processes at nice level +19. I had to hard switch off the server again.

Screenshots:
1) Backup is started and pings start to fail
https://dl.dropboxusercontent.com/u/372 ... o-hang.jpg

2) Backup is hanging and everything else hangs, even my ping command hangs, I can't ctrl + c out of it to initiate a kill -9 command.
https://dl.dropboxusercontent.com/u/372 ... mplete.jpg

3) DigitalOcean console shows call trace, VeeamAgent process blocked for more than 120seconds
https://dl.dropboxusercontent.com/u/372 ... -trace.jpg

iostat log from the cloud server:
https://dl.dropboxusercontent.com/u/37279548/iostat.log
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby vmniels » Fri Aug 12, 2016 6:49 am

Seems the combination is a bit too much CPU wise for the server.

Looking at the digitalocean website; I guess you are using the following spec?
Code: Select all
    1GB Memory
    1 Core Processor
    30GB SSD Disk
    2TB Transfer


I might have to give it a try so you don't have to reboot your development server all the time :-)
VCP-DCV
Veeam Certified Engineer
http://foonet.be
vmniels
Veeam Software
 
Posts: 1440
Liked: 318 times
Joined: Mon Jul 15, 2013 11:09 am
Full Name: Niels Engelen

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby PTide » Fri Aug 12, 2016 9:17 am

So I tried again.
Thank you for the logs. May I ask you to perfrom one more test? Please start iostat statistics collection again and copy the whole /usr/src folder from the server to the share (can be done with cp -r /usr/src /path/to/nfs/share). If the copy test finishes fine then please install and run CPU stress tool:
Code: Select all
wget ftp://ftp.pbone.net/mirror/dag.wieers.com/redhat/el7/en/x86_64/dag/RPMS/stress-1.0.2-1.el7.rf.x86_64.rpm
rpm -ivh stress*
stress -c 1 &

and run the copy test with iostat again.

We really appreciate your assistance!
PTide
Veeam Software
 
Posts: 2884
Liked: 236 times
Joined: Tue May 19, 2015 1:46 pm

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby sg_sc » Fri Aug 12, 2016 11:19 pm

@vmniels
Yes that is the 10$ plan a month (withouth backups) I'm using.

@PTide:
So I did the copy to the mounted NFS share. It went fine, but slow. Pings did not drop. Server did not hang.
Log:
https://dl.dropboxusercontent.com/u/372 ... to-nfs.log

I ran the stress tool and started the copy again. It went fine, but slow. Pings did not drop. Server did not hang, although load went up to about 3.3 and I could see the stress process using 100% CPU. I canceled the copy after a couple of minutes.
Log:
https://dl.dropboxusercontent.com/u/372 ... stress.log
Screenshot:
https://dl.dropboxusercontent.com/u/372 ... pyTest.jpg
sg_sc
Enthusiast
 
Posts: 35
Liked: 5 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: I/O hangs during Veeam Snapshot and NFS/openvpn fails

Veeam Logoby PTide » Mon Aug 15, 2016 6:53 pm

Could you please install and run nfsstat during backup on both server and client side? We need to check NFS stats as well.

Thank you!
PTide
Veeam Software
 
Posts: 2884
Liked: 236 times
Joined: Tue May 19, 2015 1:46 pm

Next

Return to Veeam Agent for Linux



Who is online

Users browsing this forum: No registered users and 2 guests