Discussions specific to tape backups
Post Reply
Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Slow Tape Job Performance

Post by Stephan23 » Aug 06, 2019 8:50 am

For a long time now I suffer from a relatively poor tape job performance of about 150 MB/s. Because of the growths of the backup files the job now needs usually more than 40 hours to complete which gets to a point which is more and more disruptive.

After I wasn't able to find the clear cause of the issue I contacted the support, to help analyse it and/or point to the cause. Unfortunately this wasn't very helpful so far. To be honest, I'm pretty unsatisfied with the whole case (# 03666072). So I was hoping to get more information here, from people with similar experiences.

Tape Proxy:
Dell R620
2x E5-2630
192 GB RAM
8 Gb/s Fibre Channel

Tape Library:
Quantum Scalar i3
IBM Ultrium 8 HH
8 Gb/s Fibre Channel

Backup Storage:
NetApp E2860
20 x 4 TB (7200 rpm)
8 Gb/s Fibre Channel

Windows Server 2016
30 TB ReFS 64k repository

I'm using LTO 7 tapes formatted as type M. So I would expect a throughput of 300 MB/s, when the target is the bottleneck.
What I'm getting is an average throughput of 150 MB/s and source as bottleneck (~ 87%)
Image

In contrary, there is one File to Tape job that backs up files from a NTFS partition that resides on the same storage system, which is always fast (300 MB/s, bottleneck: target).
Image

During the case I was asked to perform a benchmark on the partition with the source jobs, which looks OK:

Code: Select all

>diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Command Line: diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Input parameters:

        timespan:   1
        -------------
        duration: 600s
        warm up time: 5s
        cool down time: 0s
        random seed: 0
        path: 'H:\testfile.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing read test
                block size: 524288
                using random I/O (alignment: 4096)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 1
                using I/O Completion Ports
                IO priority: normal

System information:

        computer name: veeam-san
        start time: 2019/08/05 13:22:37 UTC

Results for timespan 1:
*******************************************************************************

actual test time:       600.00s
thread count:           1
proc count:             24

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  14.64%|   0.51%|   14.14%|  85.36%
   1|   0.38%|   0.11%|    0.27%|  99.63%
   2|   0.48%|   0.07%|    0.41%|  99.52%
   3|   0.30%|   0.10%|    0.20%|  99.70%
   4|   0.41%|   0.05%|    0.35%|  99.59%
   5|   0.08%|   0.06%|    0.02%|  99.92%
   6|   0.63%|   0.13%|    0.51%|  99.37%
   7|   9.74%|   3.22%|    6.52%|  90.26%
   8|   0.85%|   0.18%|    0.67%|  99.15%
   9|   0.11%|   0.07%|    0.04%|  99.89%
  10|   0.30%|   0.10%|    0.20%|  99.70%
  11|   0.20%|   0.04%|    0.16%|  99.80%
  12|   1.06%|   0.12%|    0.93%|  98.94%
  13|   1.12%|   0.08%|    1.04%|  98.88%
  14|   0.33%|   0.10%|    0.23%|  99.67%
  15|   0.05%|   0.05%|    0.00%|  99.95%
  16|   2.70%|   1.82%|    0.88%|  97.30%
  17|   0.24%|   0.06%|    0.18%|  99.76%
  18|   0.25%|   0.09%|    0.16%|  99.75%
  19|   0.07%|   0.06%|    0.01%|  99.93%
  20|   0.07%|   0.05%|    0.02%|  99.93%
  21|   0.04%|   0.04%|    0.00%|  99.96%
  22|   0.05%|   0.03%|    0.02%|  99.95%
  23|   0.23%|   0.04%|    0.19%|  99.77%
-------------------------------------------
avg.|   1.43%|   0.30%|    1.13%|  98.57%

Total IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Read IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Write IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00
After reading trough the forums and looking for performance issues I often found a statement about fragmentation of VBK files caused by fast clone, which fits my case very well. I also did some test jobs on newly created volume, new VBK and got good performance.
  • Is fragmentation the cause of poor performance in my case?
  • How am I able to confirm this?
  • What can be done to increase performance, while staying on ReFS? Active fulls are not an option for me.
  • Is the storage system not good enough, even with high fragmentation?
  • Would more (7k) disks on the back end increase performance?
  • What else could be the cause in my case?
Regards
Stephan

Dima P.
Product Manager
Posts: 10551
Liked: 860 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Aug 06, 2019 6:30 pm

Hello Stephan,

Looks like the main issue is with backup files being dehydrated while sitting in ReFS. In order to write it to tape data block should be retrieved from the system and it takes time to actually get these data blocks from ReFS. Can you please clarify if disk jobs are configured to create periodic synthetic full backup or you synthesized full with tape job? Thank you!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Aug 07, 2019 8:35 am

Hello Dmitry,

not quite sure what dehydrated backup files means in that context.

Of the two relevant backup jobs one is creating a synthetic full every week, the other is configured as reverse incremental.
And the tape job does not archive any incremental backups and also does only process the latest backup chain.

Dima P.
Product Manager
Posts: 10551
Liked: 860 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Aug 09, 2019 5:45 pm

Stephan,

Just to make sure we are on the same page can you please clarify if deduplication is enabled on your ReFS volume? Thank you in advance!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Aug 12, 2019 7:29 am

I was under the impression, that there is no dedup for ReFS on Server 2016? So no, it's not.

"inline data deduplication" in enabled for all backup jobs, if it's that what you mean.

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer » Oct 03, 2019 9:04 pm

Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.

Dima P.
Product Manager
Posts: 10551
Liked: 860 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 04, 2019 11:23 am

Hello notrootbeer,

Can you please share your case ID? Thank you in advance!

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer » Oct 04, 2019 3:25 pm

Yes! It's 03721079

Dima P.
Product Manager
Posts: 10551
Liked: 860 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 09, 2019 11:03 am

Thanks! Case is being reviewed, so please keep working with our support team. Cheers!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Oct 09, 2019 2:59 pm

notrootbeer wrote:
Oct 03, 2019 9:04 pm
Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.
My case was also escalated, but unfortunately the issue was not resolved, but an explanation was given.
I hope it's OK to quote from the Technical Support:
The situation is unfortunately quite expected.
As you know, ReFS (when using Block Cloning) allows several files to share the same physical data. Instead of high-cost copy of real block, it just copies metadata and sets up references to physical regions. However to read the such file, it is required to read the address with the link and only then read the data from are where data is stored. More links are used, more effort is required to read the file.

So degradation of read performance seems to be an expected cost of fast merge operations. Backup to tape includes reading data from source, and, as your BTT logs say, source is always the most time-consuming operation. This is unfortunately well-known ReFS limitation which hardly could be overcome.
Switching to NTFS was an optioned mentioned to work around the issue.

I also expressed my displeasure that no information regarding performance degradation is mentioned in any documentation or KB article, especially if it is "quite expected" and a "well-known ReFS limitation".

However, the explanation confirmed what I already suspected and I was satisfied with it.
My plan is to experiment with Active Fulls as soon as our VPN connection to a mirror Veeam repository gets "upgraded" to mitigated the issue with fresh backup chains.
Depending on the outcome we might consider switching to NTFS as well.

Dima P.
Product Manager
Posts: 10551
Liked: 860 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 10, 2019 9:07 am

Hello Stephan,

Thank you for honest feedback. We will discuss the investigation results with our support team and technical writers to make the corresponding adjustments in the Help Center. Additionally, I'll noted the improvement request and raise this topic with RnD folks. Cheers!

Post Reply

Who is online

Users browsing this forum: No registered users and 8 guests