Slow source in B2D using ReFS with Fast-Clone?

cerberus · Aug 24, 2020 9:12 pm

We've noticed that on our Veeam B&R 9.5U4 server, running on a Dell R740XD2 Windows 2019 backing up to a LTO7 tape drive has slowed down drastically over time.

The local storage in our R740XD2 is 14x4TB 7.2K in RAID10.

Using DiskSpd.exe we can get close to 700MB/s.

Code: Select all

Command Line: C:\IT\DiskSpd-2.0.21a\x86\diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 E:\test.vbk

Input parameters:

	timespan:   1
	-------------
	duration: 600s
	warm up time: 5s
	cool down time: 0s
	random seed: 0
	path: 'E:\test.vbk'
		think time: 0ms
		burst size: 0
		software cache disabled
		hardware write cache disabled, writethrough on
		performing read test
		block size: 524288
		using random I/O (alignment: 4096)
		number of outstanding I/O operations: 2
		thread stride size: 0
		threads per file: 1
		using I/O Completion Ports
		IO priority: normal

System information:

	computer name: VAN-BACKUP01
	start time: 2020/08/24 17:15:13 UTC

Results for timespan 1:
*******************************************************************************

actual test time:	600.00s
thread count:		1
proc count:		32

Total IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    437503655936 |       834472 |     695.39 |    1390.78 | E:\test.vbk (1024MiB)
------------------------------------------------------------------------------
total:      437503655936 |       834472 |     695.39 |    1390.78

Read IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    437503655936 |       834472 |     695.39 |    1390.78 | E:\test.vbk (1024MiB)
------------------------------------------------------------------------------
total:      437503655936 |       834472 |     695.39 |    1390.78

Write IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 | E:\test.vbk (1024MiB)
------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00

I also confirmed using ITDT that our tape drive can write at the advertised LTO7 300MB/s.

Code: Select all

     IBM Tape Diagnostic Tool Standard Edition  - Full Write

       Host Bus  ID   LUN  Model          Serial       Fware  Changer
      +----+----+----+----+--------------+------------+------+------------+
      | 5  | 0  | 0  | 0  | ULTRIUM-TD7  | F3A300A000 | J4D0 |            |
      +----+----+----+----+--------------+------------+------+------------+

   Compres- Transfer  Data Size Elapsed  Data Rate
    sible   Size (KB)   (MB)    Time (s)   (MB/s)     Remaining (MB)
   +------+---------+---------+--------+----------+   +---------------+
   | No   | 256     | 346181  | 1207.4 | 286.717  |   | 5948068       |
   +------+---------+---------+--------+----------+   +---------------+
   Status:
   +----------------------+
   | FULL WRITE           |
   +----------------------+
   Progress:
   +------------------------------------------------------------------+
   |###                                                               |
   +------------------------------------------------------------------+

If I create a new B2D2T job identical to the slow ones, it runs fast close to 300MB/s and target as the bottleneck (tape); this is where we want to be - bottle-necking at the target tape drive.

Based on tape-f29/slow-tape-job-performance-t61054.html post with similar issue there were a few hints dropped that are similar in our environment, backup files being dehydrated while sitting in ReFS

It goes on to quote Veeam TS coming back with...

The situation is unfortunately quite expected.
As you know, ReFS (when using Block Cloning) allows several files to share the same physical data. Instead of high-cost copy of real block, it just copies metadata and sets up references to physical regions. However to read the such file, it is required to read the address with the link and only then read the data from are where data is stored. More links are used, more effort is required to read the file.

So degradation of read performance seems to be an expected cost of fast merge operations. Backup to tape includes reading data from source, and, as your BTT logs say, source is always the most time-consuming operation. This is unfortunately well-known ReFS limitation which hardly could be overcome.

Our B2D2T is reverse incremental using CBT, SAN-MODE and FAST-CLONE on REFS; we can't change out of reverse incremental because every tape job has to be a full backup.

How can I confirm 100% that ReFS dehydration is the root cause? Is there anything else that we can do to reduce the ReFS metadata that is potentially slowing down the B2T?

I guess one could delete all backups and start over and it would be fast for the first X amount of runs.. Switch back to NTFS... Swap spinning disks for SSDs...

Is there anything we can do with Veeam software to reduce the metadata overhead? Can we re-hydrate the reverse incremental backups so that the B2T does not take such a read I/O hit?

After banging my head against this for 2 days I came to this ReFS dehydration conclusion and looking for some help/options that don't involve going back to NTFS and loosing out on Fast-Clone (30min B2D jobs are nice).

Support Case #04351586

Post by **HannesK** » Aug 25, 2020 5:45 am this post

Hello,
an upgrade to V10 probably solves the issue.

post366699.html#p366699

Best regards,
Hannes

cerberus · Post by **cerberus** » Aug 25, 2020 1:50 pm this post

Thanks Hannes,

That is the thread I was looking for but could not find during my testing. So awesome that there a potential fix for this in V10.

We will start planning the upgrade now.

cerberus · Post by **cerberus** » Aug 26, 2020 3:20 am this post

I just upgraded to latest V10 and the B2T speed is the same, 2 concurrent jobs streaming data to 2 LTO7 tapes at 100-125MB/s each.

The source shows as the bottleneck.

DiskSpd shows close to 700MB/s read but I am only getting 100MB/s to each tape drive.

As mentioned in earlier post, I suspect this has to do with ReFS fast-clone only because if I create (2) new B2D2T jobs using the same source/destination hardware I get 280MB/s to each tape.

Writing to both LTO7 drives at once using ITDT gets close to 300MB/s on each drive, i've ruled that out being an issue.

Code: Select all

     IBM Tape Diagnostic Tool Standard Edition  - Full Write

       Host Bus  ID   LUN  Model          Serial       Fware  Changer
      +----+----+----+----+--------------+------------+------+------------+
      | 5  | 0  | 0  | 0  | ULTRIUM-TD7  | F3A300A000 | J4D0 |            |
      +----+----+----+----+--------------+------------+------+------------+

   Compres- Transfer  Data Size Elapsed  Data Rate
    sible   Size (KB)   (MB)    Time (s)   (MB/s)     Remaining (MB)
   +------+---------+---------+--------+----------+   +---------------+
   | No   | 256     | 346181  | 1207.4 | 286.717  |   | 5948068       |
   +------+---------+---------+--------+----------+   +---------------+
   Status:
   +----------------------+
   | FULL WRITE           |
   +----------------------+
   Progress:
   +------------------------------------------------------------------+
   |###                                                               |
   +------------------------------------------------------------------+


     IBM Tape Diagnostic Tool Standard Edition  - Full Write

       Host Bus  ID   LUN  Model          Serial       Fware  Changer
      +----+----+----+----+--------------+------------+------+------------+
      | 5  | 0  | 1  | 0  | ULTRIUM-TD7  | F3A300A004 | J4D0 |            |
      +----+----+----+----+--------------+------------+------+------------+

   Compres- Transfer  Data Size Elapsed  Data Rate
    sible   Size (KB)   (MB)    Time (s)   (MB/s)     Remaining (MB)
   +------+---------+---------+--------+----------+   +---------------+
   | No   | 256     | 125884  | 448.767| 280.511  |   | 6168365       |
   +------+---------+---------+--------+----------+   +---------------+
   Status:
   +----------------------+
   | FULL WRITE           |
   +----------------------+
   Progress:
   +------------------------------------------------------------------+
   |#                                                                 |
   +------------------------------------------------------------------+

Would "Defragment and compact full backup file" have any effect on reverse incremental backups stored on ReFS? As in, would it re hydrate the dehydrated data caused by fast-clone block cloning?

Is there anything we can do to boost the B2D2T speed on existing data? The B2D benefits of ReFS Fast-Clone is amazing, I don't want to switch back to NTFS.

Other than throwing away the spinning disk storage and upgrading to read-intensive SSDs, is there anything form software side we can do?

R&D Forums

Slow source in B2D using ReFS with Fast-Clone?

Re: Slow source in B2D using ReFS with Fast-Clone?

Re: Slow source in B2D using ReFS with Fast-Clone?

Re: Slow source in B2D using ReFS with Fast-Clone?

Who is online