Using tape as a backup target
Post Reply
Stephan23
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Slow Tape Job Performance

Post by Stephan23 »

For a long time now I suffer from a relatively poor tape job performance of about 150 MB/s. Because of the growths of the backup files the job now needs usually more than 40 hours to complete which gets to a point which is more and more disruptive.

After I wasn't able to find the clear cause of the issue I contacted the support, to help analyse it and/or point to the cause. Unfortunately this wasn't very helpful so far. To be honest, I'm pretty unsatisfied with the whole case (# 03666072). So I was hoping to get more information here, from people with similar experiences.

Tape Proxy:
Dell R620
2x E5-2630
192 GB RAM
8 Gb/s Fibre Channel

Tape Library:
Quantum Scalar i3
IBM Ultrium 8 HH
8 Gb/s Fibre Channel

Backup Storage:
NetApp E2860
20 x 4 TB (7200 rpm)
8 Gb/s Fibre Channel

Windows Server 2016
30 TB ReFS 64k repository

I'm using LTO 7 tapes formatted as type M. So I would expect a throughput of 300 MB/s, when the target is the bottleneck.
What I'm getting is an average throughput of 150 MB/s and source as bottleneck (~ 87%)
Image

In contrary, there is one File to Tape job that backs up files from a NTFS partition that resides on the same storage system, which is always fast (300 MB/s, bottleneck: target).
Image

During the case I was asked to perform a benchmark on the partition with the source jobs, which looks OK:

Code: Select all

>diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Command Line: diskspd.exe -c1G -b512K -w0 -r4K -Sh -d600 H:\testfile.dat

Input parameters:

        timespan:   1
        -------------
        duration: 600s
        warm up time: 5s
        cool down time: 0s
        random seed: 0
        path: 'H:\testfile.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing read test
                block size: 524288
                using random I/O (alignment: 4096)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 1
                using I/O Completion Ports
                IO priority: normal

System information:

        computer name: veeam-san
        start time: 2019/08/05 13:22:37 UTC

Results for timespan 1:
*******************************************************************************

actual test time:       600.00s
thread count:           1
proc count:             24

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  14.64%|   0.51%|   14.14%|  85.36%
   1|   0.38%|   0.11%|    0.27%|  99.63%
   2|   0.48%|   0.07%|    0.41%|  99.52%
   3|   0.30%|   0.10%|    0.20%|  99.70%
   4|   0.41%|   0.05%|    0.35%|  99.59%
   5|   0.08%|   0.06%|    0.02%|  99.92%
   6|   0.63%|   0.13%|    0.51%|  99.37%
   7|   9.74%|   3.22%|    6.52%|  90.26%
   8|   0.85%|   0.18%|    0.67%|  99.15%
   9|   0.11%|   0.07%|    0.04%|  99.89%
  10|   0.30%|   0.10%|    0.20%|  99.70%
  11|   0.20%|   0.04%|    0.16%|  99.80%
  12|   1.06%|   0.12%|    0.93%|  98.94%
  13|   1.12%|   0.08%|    1.04%|  98.88%
  14|   0.33%|   0.10%|    0.23%|  99.67%
  15|   0.05%|   0.05%|    0.00%|  99.95%
  16|   2.70%|   1.82%|    0.88%|  97.30%
  17|   0.24%|   0.06%|    0.18%|  99.76%
  18|   0.25%|   0.09%|    0.16%|  99.75%
  19|   0.07%|   0.06%|    0.01%|  99.93%
  20|   0.07%|   0.05%|    0.02%|  99.93%
  21|   0.04%|   0.04%|    0.00%|  99.96%
  22|   0.05%|   0.03%|    0.02%|  99.95%
  23|   0.23%|   0.04%|    0.19%|  99.77%
-------------------------------------------
avg.|   1.43%|   0.30%|    1.13%|  98.57%

Total IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Read IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |    350717739008 |       668941 |     557.45 |    1114.90 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:      350717739008 |       668941 |     557.45 |    1114.90

Write IO
thread |       bytes     |     I/Os     |    MiB/s   |  I/O per s |  file
------------------------------------------------------------------------------
     0 |               0 |            0 |       0.00 |       0.00 | H:\testfile.dat (1024MiB)
------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00
After reading trough the forums and looking for performance issues I often found a statement about fragmentation of VBK files caused by fast clone, which fits my case very well. I also did some test jobs on newly created volume, new VBK and got good performance.
  • Is fragmentation the cause of poor performance in my case?
  • How am I able to confirm this?
  • What can be done to increase performance, while staying on ReFS? Active fulls are not an option for me.
  • Is the storage system not good enough, even with high fragmentation?
  • Would more (7k) disks on the back end increase performance?
  • What else could be the cause in my case?
Regards
Stephan

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Hello Stephan,

Looks like the main issue is with backup files being dehydrated while sitting in ReFS. In order to write it to tape data block should be retrieved from the system and it takes time to actually get these data blocks from ReFS. Can you please clarify if disk jobs are configured to create periodic synthetic full backup or you synthesized full with tape job? Thank you!

Stephan23
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 »

Hello Dmitry,

not quite sure what dehydrated backup files means in that context.

Of the two relevant backup jobs one is creating a synthetic full every week, the other is configured as reverse incremental.
And the tape job does not archive any incremental backups and also does only process the latest backup chain.

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Stephan,

Just to make sure we are on the same page can you please clarify if deduplication is enabled on your ReFS volume? Thank you in advance!

Stephan23
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 »

I was under the impression, that there is no dedup for ReFS on Server 2016? So no, it's not.

"inline data deduplication" in enabled for all backup jobs, if it's that what you mean.

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer »

Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Hello notrootbeer,

Can you please share your case ID? Thank you in advance!

notrootbeer
Novice
Posts: 8
Liked: never
Joined: Jan 04, 2019 5:18 pm
Contact:

Re: Slow Tape Job Performance

Post by notrootbeer »

Yes! It's 03721079

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Thanks! Case is being reviewed, so please keep working with our support team. Cheers!

Stephan23
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 »

notrootbeer wrote:
Oct 03, 2019 9:04 pm
Stephan23, were you able to resolve your issue? We're experiencing similar behavior and we currently have our case escalated with Veeam support but still haven't gotten very far yet. We're considering switching completely back to NTFS, though.
My case was also escalated, but unfortunately the issue was not resolved, but an explanation was given.
I hope it's OK to quote from the Technical Support:
The situation is unfortunately quite expected.
As you know, ReFS (when using Block Cloning) allows several files to share the same physical data. Instead of high-cost copy of real block, it just copies metadata and sets up references to physical regions. However to read the such file, it is required to read the address with the link and only then read the data from are where data is stored. More links are used, more effort is required to read the file.

So degradation of read performance seems to be an expected cost of fast merge operations. Backup to tape includes reading data from source, and, as your BTT logs say, source is always the most time-consuming operation. This is unfortunately well-known ReFS limitation which hardly could be overcome.
Switching to NTFS was an optioned mentioned to work around the issue.

I also expressed my displeasure that no information regarding performance degradation is mentioned in any documentation or KB article, especially if it is "quite expected" and a "well-known ReFS limitation".

However, the explanation confirmed what I already suspected and I was satisfied with it.
My plan is to experiment with Active Fulls as soon as our VPN connection to a mirror Veeam repository gets "upgraded" to mitigated the issue with fresh backup chains.
Depending on the outcome we might consider switching to NTFS as well.

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Hello Stephan,

Thank you for honest feedback. We will discuss the investigation results with our support team and technical writers to make the corresponding adjustments in the Help Center. Additionally, I'll noted the improvement request and raise this topic with RnD folks. Cheers!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch »

Hey guys,

is there something new for that topic?

Will this be fixed some day or will it be better for me to change from ReFS to NTFS?

Thank you. =)

veremin
Product Manager
Posts: 17745
Liked: 1612 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin »

Might be a stupid question - but are you using Fast Clone on you ReFS volume or not? The degraded performance happens only, when backup files (selected for tape backups) reside on ReFS repository with Fast Clone enabled? Thanks!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch »

Hi veremin,

thanks for the link. Did not find the Options where I can see if it's activated or not. Aber yes. It's activated.
After backup log is writing "Synthetic full backup created sucessfully [fast clone].

But how can i deactivate it? That article is saying via registry but nothing specific.
Will that change damage the actual backups when he starts the following jobs?

Thx.

veremin
Product Manager
Posts: 17745
Liked: 1612 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin »

You can disable it via REFSVirtualSyntheticDisabled <DWORD> regkey, however, keep in mind increased storage consumption, increased backup time and additional load on storage system that will come as a payback. Thanks!

FBartsch
Influencer
Posts: 10
Liked: 2 times
Joined: Oct 21, 2019 7:41 am
Full Name: Florian Bartsch
Contact:

Re: Slow Tape Job Performance

Post by FBartsch »

I will think about it and test it.

Where do I need to add that key with which value?

Thanks.

veremin
Product Manager
Posts: 17745
Liked: 1612 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Slow Tape Job Performance

Post by veremin » 1 person likes this post

In the standard registry key:

Code: Select all

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication
Set 1 as a value.

Thanks!

aich365
Service Provider
Posts: 179
Liked: 12 times
Joined: Aug 10, 2016 11:10 am
Full Name: Clive Harris
Contact:

Re: Slow Tape Job Performance

Post by aich365 »

Hi Vladimir
Please can you advise is this a reg entry on the VCC host or on each repository server?
We have 5 Rep servers and ideally would like to only target the ones running tenant tape jobs.
Thanks.

HannesK
Veeam Software
Posts: 5681
Liked: 776 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Tape Job Performance

Post by HannesK »

@@aich365: it's a VBR server key that is applied to all repositories.

aich365
Service Provider
Posts: 179
Liked: 12 times
Joined: Aug 10, 2016 11:10 am
Full Name: Clive Harris
Contact:

Re: Slow Tape Job Performance

Post by aich365 »

Hi Hannes

Can you clarify. This surely applies to the repository server rather than the repository?

So if there are 4 repositories on the server all will be affected.

Thanks

HannesK
Veeam Software
Posts: 5681
Liked: 776 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Tape Job Performance

Post by HannesK »

all repositories will be affected. No matter whether they are on one server or on different servers. The key must be set on the backup server.

JPMS
Enthusiast
Posts: 65
Liked: 18 times
Joined: Nov 02, 2019 6:19 pm
Full Name: Jason
Contact:

Re: Slow Tape Job Performance

Post by JPMS »

If REFS block-cloning is the issue, can somebody at Veeam explain why tape backup performance is so much worse than restore performance when using the same REFS block-cloning?

I have been looking to migrate to a Windows repo with REFS and I also use tape as our secondary backup. When I first saw this thread my first thought was that if the issue was rehydration then it would equally affect restores as they both have to go through the same process. I also wondered if the issue could be resolved by doing a periodic Active Full but as I was to discover, the tape performance takes a dive after doing a single (block-cloned) incremental backup.

This week I finally bit the bullet. I went for Windows server 2019 LSB, patched with this mid-months update that is supposed to address the worst of the issues discussed in this thread veeam-backup-replication-f2/windows-201 ... 6-180.html

I ran an active full, then the following day ran an incremental backup with a creation of a synthetic full and then backed that up to tape. As with others in this thread, the tape backup (to an LTO8 tape drive) often struggled to hit 100MB/s when previously we had a sustained 300MB/s for an entire tape backup. So, at this point I decided to see how this compared to VM restore times from the repo as this is effectively the same process, just to a hard drive rather than tape.

We have a small environment, 15 VMs. All apart from one don't exceed 150GB. The one exception holds all our data and is about 2TB. I left this out of the testing because of the time restores would take and because the vast majority of it wouldn't change so it wouldn't be the best test of the effect on block-cloning on restore times. So for the first test I restored 14VMs, a total of 710GB. I did three restores, The first from the original Active Full, the second from the next days backup (16GB of changed data) and the third from the following days backup (a further 33GB of changes). As expected, the restores got slower but would still be high enough to saturate the capacity of the tape drive - 563MB/s, 473MB/s and 438MB/s.

Our repo is set to 'Use-per-VM backup files' so can process multiple VMs in parallel when conducting the above test. As tape is a serial device I felt it may be a fairer test if I just did the same for a single VM as I don't know if your tape software can write out multiple VMs in one stream. Even then, restore times for a single VM were 488MB/s, 458MB/s and 384MB/s. Again, faster than the tape drives speed of 300MB/s.

So, this can't just be a REFS block-cloning issue. The repo can recover the block-cloned data faster than the tape drive can write it, so why is tape performance so poor (currently unusable without disabling block-cloning)?

JPMS
Enthusiast
Posts: 65
Liked: 18 times
Joined: Nov 02, 2019 6:19 pm
Full Name: Jason
Contact:

Re: Slow Tape Job Performance

Post by JPMS »

Bump

Any comment from Veeam?

HannesK
Veeam Software
Posts: 5681
Liked: 776 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Slow Tape Job Performance

Post by HannesK »

any comment without support case number would be guessing.

I talked to some partners and customers last week whether they have seen general performance issues with REFS and tape and they said "no". So I'm not sure whether it is a general issue.

ShanGan
Novice
Posts: 3
Liked: never
Joined: Feb 04, 2019 10:41 am
Full Name: Shan Ganeshan
Contact:

Re: Slow Tape Job Performance

Post by ShanGan »

Hi,

Is there any update on this case. I have a very similar setup and the tape jobs are really slow. Processing rates show only upto 70MB/s. When I check the throughput (ALL TIME) the maximum speed goes upto 200MB/s. But 90% of the time its just close to 70MB/s.

I am still confuced if the issue is due to the Tape library/drive or the repository. Bottleneck always shows Target and but not too sure if thats the case.

I did log a case with veeam but its close to useless. I stopped wasting my time calling them now.

Any in put will be greatly appriciated.

Thank you,

Stephan23
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 03, 2015 8:32 am
Full Name: Stephan
Contact:

Re: Slow Tape Job Performance

Post by Stephan23 »

I switched the repository to NTFS and set Reverse incremental for every job.
Tape performance is now (just) OK. Version 10 also seems to increased the performance a little bit, but not as much as was let to believe.

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Shan,
I did log a case with veeam but its close to useless. I stopped wasting my time calling them now.
Can you please share the case ID, I'll ask support management to review the case details. Thank you in advance!

ShanGan
Novice
Posts: 3
Liked: never
Joined: Feb 04, 2019 10:41 am
Full Name: Shan Ganeshan
Contact:

Re: Slow Tape Job Performance

Post by ShanGan »

sure. 04167122

Dima P.
Product Manager
Posts: 11528
Liked: 1000 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Slow Tape Job Performance

Post by Dima P. »

Forwarded to the support management. Thanks!

Post Reply

Who is online

Users browsing this forum: No registered users and 11 guests