Comprehensive data protection for all workloads
Post Reply
pterpumpkin
Influencer
Posts: 24
Liked: 2 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Daily Merge Performance

Post by pterpumpkin » Jun 14, 2016 10:42 am 1 person likes this post

Hi all!

We have recently been facing performance limitations when running the daily merge of the oldest incremental for forever forwards.

As we have some very large VMs (around 10TB) it can take a very long time. We want to improve speed without spending money (don't we all!?!?)

We have put in some time to see where the bottleneck is, and how we can improve performance without increasing infrastructure costs.

We have found that Veeam merges seem to run with a single thread. Is it possible for Veeam merge jobs to run with multiple threads? Or by their very nature must they run as a single thread?

The current setup we have varies slightly from rep to rep, but for examples sake, we will run with the below specs of one of our reps.

IBM x3630 M4
Intel Xeon E5-2420 v2 @ 2.2GHz
64GB RAM
ServeRAID M5110 RAID Card
Windows 2012 R2
13x 6TB Nearline SAS 7.2K RPM disks in RAID6. 128K strip size. Write back with BBU enabled. Read ahead policy enabled. 16K NTFS Cluster allocation size.

The above config is definitely not optimized, as it was configured pre best practice research.

*** Keep reading if you want to see some boring stuff about tests we ran, or skip to the end if you don't ****

We generally see 50% reads and 50% writes when the merge is running, with a throughput of ~30-40MB/s. At first glance, this seems low.

We had a spare system that we wanted to setup for testing, to find the best configuration a Veeam merge workload. We setup:

IBM x3630 M3
Intel Xeon E5507 @ 2.27GHz
4GB RAM
ServeRAID M5015 Raid Card
Windows 2012 R2
2x 4TB Nearline SAS 7.2K RPM in RAID1 for OS
12x 2TB Nearline SAS 7.2K RPM for data volume

We used diskspd to simulate a Veeam merge using the below config for quick testing whilst playing with RAID setups and NTFS allocation sizes. The below command was used:
diskspd -c40G -d60 -r -w50 -t1 -o1 -b512K -h -L D:\testfile.dat

c = Test file size in GB
d = duration of test
r = random reads/writes
w = % of writes
t = threads
o = operations
b = block size
h = disabled hardware & software buffering
L = grab disk latency numbers

What we found was:
- RAID10 gave us an extra ~30MB/s throughput
- 256K stripe size and above had almost no difference in performance. 128K and lower had marginally less performance.
- NTFS allocation size had almost no difference in performance
- Disabling write back dropped throughput by 20-30MB/s

To avoid posting pages of boring test results, i ran the diskspd command mentioned above on a RAID5,6 & 10 config. All settings (stripe, allocation etc..) remained the same for each test.

RAID5:
Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 3037200384 | 5793 | 48.27 | 96.55 | 10.359 | 44.144 | E:\testfile.dat (40GB)
-----------------------------------------------------------------------------------------------------
total: 3037200384 | 5793 | 48.27 | 96.55 | 10.359 | 44.144


RAID6:
Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 2762997760 | 5270 | 43.92 | 87.83 | 11.375 | 49.756 | E:\testfile.dat (40GB)
-----------------------------------------------------------------------------------------------------
total: 2762997760 | 5270 | 43.92 | 87.83 | 11.375 | 49.756


RAID10:
Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 5386534912 | 10274 | 85.62 | 171.23 | 5.838 | 8.191 | E:\testfile.dat (40GB)
-----------------------------------------------------------------------------------------------------
total: 5386534912 | 10274 | 85.62 | 171.23 | 5.838 | 8.191

All of these results, i would still consider "slow". What it looks like (to me) is Veeam only runs merges as a single thread. As soon as we add multiple threads and operations to the diskspd test, we can see the disk throughput increase (expected).

diskspd -c40G -d60 -r -w50 -t8 -o2 -b512K -h -L D:\testfile.dat

RAID5:
84.46MB/s

RAID6:
86.15MB/s

RAID10:
424.78MB/s

So there's no doubt that RAID10 is faster in all cases, however, so is running multiple threads.


Apart from converting to RAID10 (losing a lot of usable disk space) and setting correct RAID stripe sizes + NTFS allocation sizes, are there any recommendations to improve performance on merges?


Thanks!!!!

foggy
Veeam Software
Posts: 18158
Liked: 1542 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Daily Merge Performance

Post by foggy » Jun 16, 2016 12:41 pm

I'd say the results of your testing are expected. RAID10 is indeed recommended for synthetic activity due to less write penalty, other numbers are also consistent with best practices (you can search the forum for other existing topics discussing similar questions, here's the latest one). What else is essential for merge performance, is the number of spindles, their speed and interface.

You can also consider using scale-out repositories to improve merge performance as well.

pterpumpkin
Influencer
Posts: 24
Liked: 2 times
Joined: Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin
Contact:

Re: Daily Merge Performance

Post by pterpumpkin » Jun 16, 2016 7:28 pm

Thanks for the reply!

Am i right in saying that Veeam merges run as a single thread?

foggy
Veeam Software
Posts: 18158
Liked: 1542 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Daily Merge Performance

Post by foggy » Jun 17, 2016 3:46 pm

Per-VM backup chains allow for multiple threads to repository.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], cdekermadec and 18 guests