We build a 16 drives RAID-10 ESXi server to host a Windows 2016 VM that runs Veeam 184.108.40.2068 and drops the data to the V: drive which is a 6.5TB ReFS at 64K.
This setup will die during the backup job or just after the end where it will run the tape job.
The VM is still alive but the Windows I/O is down to 0. All attempts to work with the VM will hang forever until a reset.
We tried to limit the bandwidth to the V: drive, tried 1Gbps ports instead of the 10Gbps ports to no avail.
We see around 160MB/s of writes during the backups and 285MB/s of reads when the tape job starts and sends the data to the proxy on 10Gbps to an LTO-7 drive.
Since it is the Windows I/O stack that is blowing up and no ESXi errors are present, I looked at the V: drive and decided to change the removal policy of the VMware Virtual disk SCSI Disk Device from the default "Better Performance" to "Quick Removal". By doing this the Windows Caching is deactivated and for the first time the backup jobs on both sites worked to the end and the Windows servers where still available the next morning.
There must be an issue with ReFS caches and Veeam's way of handling its large backup files. There is now performance impact for us since the source is the bottleneck.
Not sure if any one else saw these issues.