We more or less found a way to circumvent the filesystem from being stuck as shown in the screenshot below.mkretzer wrote:No. In his mail there was not one mentioning of something about the 4 k cluster size... That is also a reason i am kind of caucious about this recommendation.
Edit: Yes it looks like I just drew a white box, it's actually white-space where the windows kernel forgets the disk is ReFS.
We notice that the backup repo's (now newly formatted to 64KB cluster size) still cause the filesystem to be unresponsive, so we tried throttling the repositories to a lower throughput. Surprisingly, this significantly improved our performance. Since the volume doesn't become unresponsive, Veeam can now backup consistently without being interrupted by the unresponsiveness of the volume.
The throughput is still significantly slower than they would have been on NTFS (we are going to log a case for this as well), but at least, it's stable.
We suspect that the lower block size on the previous formatted volume, resulted the volume to get stuck even faster (in fact, 16x faster). We are backupping from All flash storage arrays, so our bottleneck almost always is our destination storage target.
We are currently monitoring the incoming IO's and as soon as it reaches its limit and causes storage latency on the backup target, the filesystem becomes unresponsive. So throttling temporarily circumvents this issue. This, however, isn't a permanent solution since, even with storage latency, the filesystem should keep on working. A 20-30 ms hiccup on the storage lun causes a 20 second unresponsiveness of the ReFS volume, which in turn brings Veeam to a halt...
@Mkretzer, have you tried throttling as well ?