Discussions specific to tape backups
Post Reply
Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Slow Tape Job Performance

Post by Stephan23 » Aug 06, 2019 8:50 am

For a long time now I suffer from a relatively poor tape job performance of about 150 MB/s. Because of the growths of the backup files the job now needs usually more than 40 hours to complete which gets to a point which is more and more disruptive.

After I wasn't able to find the clear cause of the issue I contacted the support, to help analyse it and/or point to the cause. Unfortunately this wasn't very helpful so far. To be honest, I'm pretty unsatisfied with the whole case (# 03666072). So I was hoping to get more information here, from people with similar experiences.

Tape Proxy:
Dell R620
2x E5-2630
192 GB RAM
8 Gb/s Fibre Channel

Tape Library:
Quantum Scalar i3
IBM Ultrium 8 HH
8 Gb/s Fibre Channel

Backup Storage:
NetApp E2860
20 x 4 TB (7200 rpm)
8 Gb/s Fibre Channel

Windows Server 2016
30 TB ReFS 64k repository

I'm using LTO 7 tapes formatted as type M. So I would expect a throughput of 300 MB/s, when the target is the bottleneck.
What I'm getting is an average throughput of 150 MB/s and source as bottleneck (~ 87%)
Image

In contrary, there is one File to Tape job that backs up files from a NTFS partition that resides on the same storage system, which is always fast (300 MB/s, bottleneck: target).
Image

During the case I was asked to perform a benchmark on the partition with the source jobs, which looks OK:

Code: Select all

>diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Command Line: diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Input parameters:

        timespan:   1
        -------------
        duration: 600s
        warm up time: 5s
        cool down time: 0s
        random seed: 0
        path: 'H:\testfile.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing read test
                block size: 524288
                using random I/O (alignment: 4096)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 1
                using I/O Completion Ports
                IO priority: normal

System information:

        computer name: veeam-san
        start time: 2019/08/05 13:22:37 UTC

Results for timespan 1:
*******************************************************************************

actual test time:       600.00s
thread count:           1
proc count:             24

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  14.64%|   0.51%|   14.14%|  85.36%
   1|   0.38%|   0.11%|    0.27%|  99.63%
   2|   0.48%|   0.07%|    0.41%|  99.52%
   3|   0.30%|   0.10%|    0.20%|  99.70%
   4|   0.41%|   0.05%|    0.35%|  99.59%
   5|   0.08%|   0.06%|    0.02%|  99.92%
   6|   0.63%|   0.13%|    0.51%|  99.37%
   7|   9.74%|   3.22%|    6.52%|  90.26%
   8|   0.85%|   0.18%|    0.67%|  99.15%
   9|   0.11%|   0.07%|    0.04%|  99.89%
  10|   0.30%|   0.10%|    0.20%|  99.70%
  11|   0.20%|   0.04%|    0.16%|  99.80%
  12|   1.06%|   0.12%|    0.93%|  98.94%
  13|   1.12%|   0.08%|    1.04%|  98.88%
  14|   0.33%|   0.10%|    0.23%|  99.67%
  15|   0.05%|   0.05%|    0.00%|  99.95%
  16|   2.70%|   1.82%|    0.88%|  97.30%
  17|   0.24%|   0.06%|    0.18%|  99.76%
  18|   0.25%|   0.09%|    0.16%|  99.75%
  19|   0.07%|   0.06%|    0.01%|  99.93%
  20|   0.07%|   0.05%|    0.02%|  99.93%
  21|   0.04%|   0.04%|    0.00%|  99.96%
  22|   0.05%|   0.03%|    0.02%|  99.95%
  23|   0.23%|   0.04%|    0.19%|  99.77%
-------------------------------------------
avg.|   1.43%|   0.30%|    1.13%|  98.57%

Total IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Read IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Write IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00
After reading trough the forums and looking for performance issues I often found a statement about fragmentation of VBK files caused by fast clone, which fits my case very well. I also did some test jobs on newly created volume, new VBK and got good performance.
  • Is fragmentation the cause of poor performance in my case?
  • How am I able to confirm this?
  • What can be done to increase performance, while staying on ReFS? Active fulls are not an option for me.
  • Is the storage system not good enough, even with high fragmentation?
  • Would more (7k) disks on the back end increase performance?
  • What else could be the cause in my case?
Regards
Stephan

Dima P.
Product Manager
Posts: 10870
Liked: 897 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Aug 06, 2019 6:30 pm

Hello Stephan,

Looks like the main issue is with backup files being dehydrated while sitting in ReFS. In order to write it to tape data block should be retrieved from the system and it takes time to actually get these data blocks from ReFS. Can you please clarify if disk jobs are configured to create periodic synthetic full backup or you synthesized full with tape job? Thank you!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Aug 07, 2019 8:35 am

Hello Dmitry,

not quite sure what dehydrated backup files means in that context.

Of the two relevant backup jobs one is creating a synthetic full every week, the other is configured as reverse incremental.
And the tape job does not archive any incremental backups and also does only process the latest backup chain.

Dima P.
Product Manager
Posts: 10870
Liked: 897 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Aug 09, 2019 5:45 pm

Stephan,

Just to make sure we are on the same page can you please clarify if deduplication is enabled on your ReFS volume? Thank you in advance!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Aug 12, 2019 7:29 am

I was under the impression, that there is no dedup for ReFS on Server 2016? So no, it's not.

"inline data deduplication" in enabled for all backup jobs, if it's that what you mean.

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer » Oct 03, 2019 9:04 pm

Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.

Dima P.
Product Manager
Posts: 10870
Liked: 897 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 04, 2019 11:23 am

Hello notrootbeer,

Can you please share your case ID? Thank you in advance!

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer » Oct 04, 2019 3:25 pm

Yes! It's 03721079

Dima P.
Product Manager
Posts: 10870
Liked: 897 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 09, 2019 11:03 am

Thanks! Case is being reviewed, so please keep working with our support team. Cheers!

Stephan23
Enthusiast
Posts: 28
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 » Oct 09, 2019 2:59 pm

notrootbeer wrote:
Oct 03, 2019 9:04 pm
Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.
My case was also escalated, but unfortunately the issue was not resolved, but an explanation was given.
I hope it's OK to quote from the Technical Support:
The situation is unfortunately quite expected.
As you know, ReFS (when using Block Cloning) allows several files to share the same physical data. Instead of high-cost copy of real block, it just copies metadata and sets up references to physical regions. However to read the such file, it is required to read the address with the link and only then read the data from are where data is stored. More links are used, more effort is required to read the file.

So degradation of read performance seems to be an expected cost of fast merge operations. Backup to tape includes reading data from source, and, as your BTT logs say, source is always the most time-consuming operation. This is unfortunately well-known ReFS limitation which hardly could be overcome.
Switching to NTFS was an optioned mentioned to work around the issue.

I also expressed my displeasure that no information regarding performance degradation is mentioned in any documentation or KB article, especially if it is "quite expected" and a "well-known ReFS limitation".

However, the explanation confirmed what I already suspected and I was satisfied with it.
My plan is to experiment with Active Fulls as soon as our VPN connection to a mirror Veeam repository gets "upgraded" to mitigated the issue with fresh backup chains.
Depending on the outcome we might consider switching to NTFS as well.

Dima P.
Product Manager
Posts: 10870
Liked: 897 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. » Oct 10, 2019 9:07 am

Hello Stephan,

Thank you for honest feedback. We will discuss the investigation results with our support team and technical writers to make the corresponding adjustments in the Help Center. Additionally, I'll noted the improvement request and raise this topic with RnD folks. Cheers!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch » Oct 28, 2019 8:35 am

Hey guys,

is there something new for that topic?

Will this be fixed some day or will it be better for me to change from ReFS to NTFS?

Thank you. =)

veremin
Product Manager
Posts: 17141
Liked: 1483 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin » Oct 28, 2019 10:54 am

Might be a stupid question - but are you using Fast Clone on you ReFS volume or not? The degraded performance happens only, when backup files (selected for tape backups) reside on ReFS repository with Fast Clone enabled? Thanks!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch » Oct 28, 2019 11:09 am

Hi veremin,

thanks for the link. Did not find the Options where I can see if it's activated or not. Aber yes. It's activated.
After backup log is writing "Synthetic full backup created sucessfully [fast clone].

But how can i deactivate it? That article is saying via registry but nothing specific.
Will that change damage the actual backups when he starts the following jobs?

Thx.

veremin
Product Manager
Posts: 17141
Liked: 1483 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin » Oct 28, 2019 12:15 pm

You can disable it via REFSVirtualSyntheticDisabled <DWORD> regkey, however, keep in mind increased storage consumption, increased backup time and additional load on storage system that will come as a payback. Thanks!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch » Oct 28, 2019 12:26 pm

I will think about it and test it.

Where do I need to add that key with which value?

Thanks.

veremin
Product Manager
Posts: 17141
Liked: 1483 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin » Oct 28, 2019 12:57 pm 1 person likes this post

In the standard registry key:

Code: Select all

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication
Set 1 as a value.

Thanks!

aich365
Service Provider
Posts: 126
Liked: 3 times
Joined: Aug 10, 2016 11:10 am
Full Name: Clive Harris
Contact:

Re: Slow Tape Job Performance

Post by aich365 » Jan 13, 2020 1:45 pm

Hi Vladimir
Please can you advise is this a reg entry on the VCC host or on each repository server?
We have 5 Rep servers and ideally would like to only target the ones running tenant tape jobs.
Thanks.

HannesK
Veeam Software
Posts: 4458
Liked: 560 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Tape Job Performance

Post by HannesK » Jan 20, 2020 1:16 pm

@@aich365: it's a VBR server key that is applied to all repositories.

aich365
Service Provider
Posts: 126
Liked: 3 times
Joined: Aug 10, 2016 11:10 am
Full Name: Clive Harris
Contact:

Re: Slow Tape Job Performance

Post by aich365 » Jan 20, 2020 1:20 pm

Hi Hannes

Can you clarify. This surely applies to the repository server rather than the repository?

So if there are 4 repositories on the server all will be affected.

Thanks

HannesK
Veeam Software
Posts: 4458
Liked: 560 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Tape Job Performance

Post by HannesK » Jan 20, 2020 1:54 pm

all repositories will be affected. No matter whether they are on one server or on different servers. The key must be set on the backup server.

Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests