Comprehensive data protection for all workloads
bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

backup performance and compression level

Post by bc07 »

After more than 100 tests and spending days of testing I found a way to improve my backup speed.
I got the best throughput at full backups with the following settings: enabled deduplication, compression NONE, storage WAN target.
Any other combination gave me lower results. I checked the actual reading throughput and actual writing speed with SAN tools and performance monitor.
Veeam backup server I used for testing is a physical server with internal SATA drives attached to onBoard controller.
Here are some comparisons:
Full backup of a VM from our EMC Clariion CX3-10c
With settings: enabled dedupe, optimal compression, LAN target (CPU load was at 70-80%, not maxed out)
19MB/s
With settings: enabled dedupe, no compression, WAN target (CPU load was at 50-60%, not maxed out)
80MB/s

Full backup of a VM from our Infortrend SAN:
With settings: enabled dedupe, optimal compression, LAN target (CPU load was at 70-80%, not maxed out)
25MB/s
With settings: enabled dedupe, no compression, WAN target (CPU load was at 50-60%, not maxed out)
110MB/s

I changed all scheduled backup jobs to use no compression and WAN target. I'll see tomorrow morning how much difference it makes at the actual stored data to have no compression but max dedupe.

stevil
Expert
Posts: 112
Liked: 2 times
Joined: Sep 02, 2010 2:23 pm
Full Name: Steve B
Location: Manchester, UK
Contact:

Re: VeeamAgent.exe multi-threading

Post by stevil »

Interesting thread this.
Just to confirm, I know you're getting much faster throughput without compression, as you'd expect, but does the job complete faster?

Cheers

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VeeamAgent.exe multi-threading

Post by Gostev »

Yes, job completes a few times faster. This is exactly what processing rate counter is showing - overall processing speed. If target storage was a bottleneck, we would not see such a high processing rate. Compression helps when target storage is a bottleneck (less data to write), while in this case backup storage is clearly very fast, and main bottleneck with compression enabled is backup server performance (both CPU power and memory throughtput, with memory being the reason why CPU does not max out with compression enabled).

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: VeeamAgent.exe multi-threading

Post by bc07 »

Yes, the job finishes faster. I checked the actual data throughput during my tests incoming from SAN and going to the target storage and it is really that much faster.

Why would the backup server performance (CPU) be the bottleneck if it does not max out the CPU in any case?
Veeam just processes jobs differently when no compression is used. When the settings or on no compressions and WAN target the memory usage is repeating slowly going up and down by 3GB.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VeeamAgent.exe multi-threading

Post by Gostev »

bc07 wrote:Why would the backup server performance (CPU) be the bottleneck if it does not max out the CPU in any case?
It is not CPU power, but memory throughput that does not let CPU max out with compression enabled. I am assuming you are using fairly outdated system as your backup server, is that right?
bc07 wrote:Veeam just processes jobs differently when no compression is used.
That is not correct, compression level has no effect on how Veeam processes the job. Compression is just one of steps in data processing "conveyor". With compression disabled, this step is just skipped - but the whole "conveyor" is still the same.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: VeeamAgent.exe multi-threading

Post by bc07 »

You call this outdated?
Image
What would be a slow memory/system for you/Veeam?

Incorrect. Something changes in the backup process when compression is enabled.

I gave the workaround for everyone who has similar issues (no compression, WAN target). If that does not work for some people then you really have a performance bottleneck somewhere.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VeeamAgent.exe multi-threading

Post by Gostev »

bc07 wrote:What would be a slow memory/system for you/Veeam?
Any system below minimum system requirements (4 core processor system). This makes it at least first generation Intel quad core processor Q6600 (released over 4 years ago). These systems will have no issues with memory throughtput "automatically" (unlike some earlier architectures), as they require DDR RAM. A Q6600-based system will allow about 55MB/s processing rate during full backup under worst possible conditions (random VMDK content, artificially generated) and default settings (Optimal compression). Switching compression to Low will make the speed go to about 160MB/s on the same system under the same conditions. Of course, real-world workloads will be processed much faster, many blocks will be skipped from processing thanks to dedupe.
bc07 wrote:Incorrect. Something changes in the backup process when compression is enabled.
Not in the code, but OK, let it be so. Customer is always right :D
bc07 wrote:I gave the workaround for everyone who has similar issues (no compression, WAN target). If that does not work for some people then you really have a performance bottleneck somewhere.
It is fully expected that disabling compression (or switching it to Low) will improve performance on backup server that does not meet system requirements. I have been recommending this workaround for years in every topic where customer were looking for the best way to run backup server on a slow system. In fact, with super fast backup storage, reducing compression level will improve performance slightly even on systems that do meet system requirements (for obvious reasons), but it just does not generally make sense to do, because the resulting backup size will be too big. And the storage is far, far more costly asset. Always cheaper to get a better backup server instead.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: VeeamAgent.exe multi-threading

Post by bc07 »

Well, our Veeam server meets the minimum requirements (4 cores, 8GB DDR2-800 memory). And when I use Low compression instead of none it cuts the throughput in half. Strange isn't it? The issue is not as simple as everyone from Veeam thinks.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VeeamAgent.exe multi-threading

Post by Gostev »

Then it might be hardware issue, for instance BIOS performance settings. From the numbers you posted (19MB/s processing rate causing 70-80% CPU load), your server is performing at least 2-3 times slower as I would expect from oldest possible 4 core system (because we have one in our lab, and ours is so old it does even not have DDR2). I have posted some numbers from our system above. Keep in mind this was under worst possible conditions (source data was un-dedupe-able).

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: VeeamAgent.exe multi-threading

Post by bc07 »

Yeah right and the sky is green.

Depeding on what memory benchmark tool I use and what block size it uses I get between 600 - 6000MB/s read/write throughput.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: VeeamAgent.exe multi-threading

Post by Gostev »

OK, I am just trying to help here to the best of my knowledge.

Both our lab, and your production are nearly identical:
1. Uber-fast source and target storage which are guaranteed not to be bottlenecks.
2. Nearly identical backup servers performance-wise.
3. Very different results, with out setup beating yours in a few times no matter of compression level, despite ours processing artificial, worst-case-scenario workload.
4. Optimal compression not loading your CPU to 100% despite of no issues fetching data (never ever seen this - with optimal compression, CPU absolutely must max out with fast storage). And again, flat 100% CPU load on our backup server across all 4 cores.

I am just using logic to draw some conclusions based off the above. If I am not doing very well with my conclusions (sounds like what I am saying only irritates you anyway), then let's stop this discussion altogether and have support handle this. Anyway, this is not a support forum, and the topic is about different thing - the need of multi-threading - while your numbers above only prove yet again that even single thread can potentially do 100 MB/s plus. Or, as I've said, even 500MB/s with optimal compression - that is, if source storage, backup server and target storage allow for this.

Thanks

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: VeeamAgent.exe multi-threading

Post by bc07 »

That there would be a memory performance issue was an interesting suggestion but does not apply to the server we use.
Support is not much help have (had) a ticket open about this issue. I only wanted to post my results here which maybe helps other customers who also experiencing similar performance issues. I didn't know that would result in a longer discussion but I had to respond to your assumptions/suggestions which did't apply/help in my case. Maybe move every post after my first result post from March 22nd to another thread.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup performance and compression level

Post by Gostev »

Done.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

If you want you could try reproducing the issue with one of your systems and use internal onboard non-RAID controller with a single SATA drive (or something similar) where you it would do like 80MB/s sequential writes...
Or you could ask one of the Veeam developers or tell him/her about this issue.

tsightler
VP, Product Management
Posts: 5689
Liked: 2514 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: backup performance and compression level

Post by tsightler »

Well, one thing compression would do would be to add more pressure on the memory bandwidth. If you're just copying data without without compression you can just pass the pointer to the read data directly to the write routine without even performing a data copy on the allocated memory (zero copy, I don't know if Veeam does this, but it's plausible), however, with compression, the compression routine would have to read the data from memory, read the dictionary table from memory (I'm assuming Veeam uses some variation of classic dictionary based compression) write the new compressed data to memory and finally transfer that pointer to the write routine. That's a lot more memory accesses and thus a lot more memory bandwidth is required. This might also increase latency of read requests if Veeam doesn't read the next block until the block in memory is written. Even very minor increases in latency translate into huge impacts in throughput when doing single block reads.

Many memory bandwidth test are not very good at showing the real memory bandwidth because they do things like allocate memory sequentially or simply read the same allocated block over and over which may not map to how the application uses the memory. They're good for comparing between systems, but not that useful otherwise.

Very interesting observations and topic. Just wanted to post some of my own thoughts.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup performance and compression level

Post by Gostev »

Thanks Tom.

Yes, we do "read ahead" in multiple places. Essentially, each data processing step in the whole data processing conveyor has associated incoming queue, and incoming data (provided by previous step) is put into this queue until the queue is full. In other words, there is always data available that is ready to be processed as soon as the engine of the corresponding step is available. Unless of course previous data processing step is lagging behind, and cannot supply the data as fast as the following step is able to consume it, in which case the following step's incoming queue stays empty waiting for the data most of the time.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

To be 100% certain the memory is an issue in my case I had to replace the mainboard/CPU/RAM with something faster and keep the storage the same way. I don't have another system I can test with. Maybe you/Veeam can replicate the behavior in the lab.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

Thanks Tom. If you don't mind we could do a memory benchmark comparison with one free tool.

I reconfigured the RAID (4 disks RAID10 now) on the LSI controller of my backup server and get there the following processing rate at full backup (no CBT, no index, no app aware):
no compression, WAN target = 160MB/s
low compression, WAN target = 36MB/s
low compression, LAN target = 34MB/s

The more changes and tests I do the more interesting it gets :)

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

After two night os incremental backups it looks like going from Optimal Compression/LAN target to No Compression/WAN target doubled the the backup size on disk. When doing full backup (test only with Exchange VM yet) I see only an increase of 30% .

So maybe I do the full backups (once per month) with No Compression/WAN and then the incrementals with higher compression.

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup performance and compression level

Post by Gostev »

bc07 wrote:no compression, WAN target = 160MB/s
low compression, WAN target = 36MB/s
Low compression, LAN target = 160MB/s

That is from nearly identical Intel Q6600 based system in our lab processing worst-case scenario workload.

No idea what causes 4x performance drop enabling Low compression on your backup server. Low compression was specifically designed to consume least possible amount of resources, because it was introduced at the time when our solution was relying on the ESX service console agents for direct-to-target backups, and you know service console is really limited in processing resources.

Anyhow, I am planning to setup high-end performance testing lab with newest Intel Quad Core processor (i5 2500K overlocked to 4.7 GHz). Planning on using SSD drive and RAM disk as backend storage, and fastest software iSCSI target I know. Looking forward to do some hardcore testing ;)

tsightler
VP, Product Management
Posts: 5689
Liked: 2514 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: backup performance and compression level

Post by tsightler »

bc07 wrote:Thanks Tom. If you don't mind we could do a memory benchmark comparison with one free tool.
I'm not sure how valid the test would be because, as I stated, they don't really tell the entire story, they only measure the performance of a particular use of memory, and they most certainly don't mirror the exact way Veeam would use memory. Not only that, but my results running on older hardware aren't that different from yours.

I also don't think that memory bandwidth is the sum total of the entire bottleneck. Why? Well, in my case I can run two jobs and the preformance almost doubles. Heck, I can run three jobs and the performance come darn close to tripling. Only at four jobs does my system finally start to top out and only increases about 3.5x over a single job. So the results I see are something like:

1 job = 40-50MB/sec (actual transfer around 30-35MB/sec the reported speed is higher due to whitespace, etc)
2 job = 80-95MB/sec (roughly 30-35MB/sec per job)
3 jobs = 120-140MB/sec (roughly 30-35MB/sec per job)
4 jobs = 150MB/sec (roughly 25MB/sec per job)

This is on older hardware with low compression. So obviously a single job is not using all of the memory bandwidth because I can run multiple jobs and still get improved performance. Of course, there could be many other things in play here. For example, memory latency caused by the extra copying in the compression routine might be holding up the read queue causing my storage to not provide it's maximum throughput. Some storage array semantics won't go into aggressive readahead until they see sequential reads at a very high rate (say 1000/sec). If compression slows down the system enough to perform a block read only every 3ms instead of every 1ms then you might not get the full benefits of your drives read-ahead.

We just learned to live with the lower throughput because it just doesn't really impact us. We run fulls once a month and the performance of incrementals is acceptable. We split our jobs up and run a fair number in parallel. The only one that is an issue is a large fileserver that is nearing 2TB in size. That one regularly takes ~24 hours to run it's full backup, but because it's throughput is so slow it doesn't really impact production use of the server so we just let the backup run as long as it takes. One day we'll get a more powerful backup server.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

Anton,

have fun, maybe test also with a normal SATA drive as target. :)

Tom,

Thanks for sharing your thoughts and results about backup performance and compression. I see that you also have a file server with the same size (2TB) as we have. In my case without changing anything it would take around 30hours to do a full backup and that is just too long for our environment and backup strategy/schedule. Oh btw Veeam does not show the remaining time correctly if the job takes more than 24 hours, probably just cuts of the days.
Could you do a test on your file server? Configure a test backup in Veeam of you file server, you don't have to select all disks (if you have more than one) but the disk with the data on it, don't check CBT, app-aware and guest index, keep dedupe enabled and select no compression and WAN target, you don't need to let the job finish let it run for maybe 20-30minutes.
I'm really curios what results you get at the processing rate compared to what you had before.

TaylorB
Enthusiast
Posts: 57
Liked: 5 times
Joined: Jan 28, 2011 4:40 pm
Full Name: Taylor B.
Contact:

Re: backup performance and compression level

Post by TaylorB »

Just to throw my numbers out there in case it helps since I have a pretty high end Veeam server:

All SAN mode with deduplication enabled. Writing to local SAS disk array in a HP DL385 G7 with 16GB RAM and two 8 core AMD cpus:

FC SAN, no compression, local target = 219MB/s, 15-25% CPU
SATA iSCSI, no compression, local target = 40MB/s, <10% CPU

FC SAN, best compression, WAN target = 203MB/s 90-97% CPU
SATA iSCSI, best compression, WAN target = 34MB/s, 20-50% CPU

I'm only taking a 10-15% hit on the highest compression and dedupe level with my system.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

Thanks for posting your results. How is the performance with no compression and WAN target?

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

After the success with the file server full backup tests I moved the Exchange backup from the virtual backup appliance to the backup server.
With the virtual backup appliance I had a rate of 16-19MB/s at full backups (target windows file share on backup server).
But after I looked this morning why the backup wasn't finished I saw it crawling with 9MB/s and was only 60% done. what the .... ?
Why was this so slow all the sudden....? So I had to do some testing, again!
Here are the tests and results.

Exchange VM is: OS vmdk 4%, data vmdk 88%, logs vdmk 8% of whole VM
full backup of whole Exchange VM:
dedup, no compress, WAN target, CBT, app aware, to LSI storage 9MB/s (canceled at 60%)
dedup, no compress, WAN target, CBT, app aware, to onBoard storage 20MB/s (canceled at 40%)

full backup only Exchange data vmdk, runtime 10-20minutes, no CBT, no app aware:
to onBoard controller storage:
no dedup , no compress, local target = 51 MB/s
no dedup, optimal compress, local target = 100% CPU 42MB/s reduction by compression around 25%
dedupe, no compress, local target = 40-50% CPU 50MB/s reduction by dedup 0%

to LSI storage
no dedup, no compress, local target = 20MB/s
dedup, no compress, LAN target = 25MB/s
no dedup, no compress, WAN target = 120MB/s, 50% CPU load
dedup, no compress, WAN target = 120MB/s , 50% CPU load
dedup, no compress, WAN target, app aware, CBT = 120MB/s 50% CPU load

full backup of whole Exchange VM:
no dedup, no compress, WAN target, CBT, app aware, to LSI storage = 22MB/s (100% completed backup)
processing rate is very fast at the beginning (100MB/s) and then drops slowly over time, don't know if that is storage system related or a feature of Veeam
overall that is not really faster than with the vba, the log file vmdk took an hour and it wrote only with 5MB/s. And the last half or third of the data vmdk I got an interesing write pattern:

Image
Image


backup method for all backups was SAN

When you do full backup (tests) do them also with deduplication disabled.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

With the full backups getting slower and slower.... especially with big vmdk files seems to be somehow related to the physical server where I'm backing up VM's with SAN method, I tried as destination both internal storages and SAN.
I did a test from a vba to another RAID group on the SAN and it sustained the backup speed with the same settings. The strange thing is the issue only shows when with Veeam. When I copy a file from one internal storage to another or from the network the transfer rate remains the same. Maybe it is because Veeam reads from the VM storage in 130-150KB or 256KB block and writes with 2MB blocks. Or has Veeam a problem with AMD CPU's or the mainboard chipset?

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup performance and compression level

Post by Gostev »

Hard to say without trying on non-AMD server, leaving everything the same... but I do not recall anyone reporting something like this before. I actually think that the issue with specific server is more likely here.

BTW, we do not select/affect the block size we read or write with. Source data is retrieved by Storage API, there is no way to control block size there (Local/LAN/WAN optimization does not affect this). I believe vStorage API does reads with 256KB blocks, but I am not 100% sure, especially because VMware changed this value once at some point earlier. As for writes, everything we write goes through the system cache, and is completely in Windows hands to decide how it is best to write this data to disk.

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

Well, maybe other customers just live with the bad performance when backing up big VM's. After the tape backups are done this week I'll test with a fresh install on a seperate disk to find out if it is software related. I got the okay from my boss to buy new hardware for the backup server.

It looks like Windows makes then the wrong decision when writing data. I did some re-configuration on the SAN and cleanup and was able to make a full backup of the whole file server (2.2TB vmdk size, 2TB actual data, 3 vmdk files) from a virtual Backup appliance with settings dedup, low compression and WAN target in 8 hours (80MB/s).

Gostev
SVP, Product Management
Posts: 26917
Liked: 4373 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: backup performance and compression level

Post by Gostev »

Am I reading this correctly, you are saying you made some configuration changes to your target SAN? What were those changes? Please do tell :)

bc07
Enthusiast
Posts: 85
Liked: never
Joined: Mar 03, 2011 4:48 pm
Full Name: Enrico
Contact:

Re: backup performance and compression level

Post by bc07 »

I made some changes (cleanup, moving files) on the SAN to have enough disk space to backup the file server to it from the VBA instead of backing up over the network to the backup server.

Post Reply

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot], Google [Bot], liorme and 59 guests