Comprehensive data protection for all workloads
tsightler
VP, Product Management
Posts: 5305
Liked: 2160 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Veeam Restore Speed

Post by tsightler » Mar 29, 2010 9:42 pm

OK, so we recently had a storage array that decided to eat 4TB of VM's sitting on a group of VMFS volumes so Veeam got a good workout. We were able to restore our VM's without serious difficulty, however, it took far to long to get our VM's restored. The average transfer speed was around 30-40MB/sec.

Today, just playing around, I decided to try restoring a VMDK file to one of our Linux host rather than to an ESX server. I was amazed at the speed difference, easily hitting 110MB/sec and faster. Why is restoring to VMFS volumes via the ESX console so slow? I understand the VMware COS is not optimized for this operation, but it still seems exceptionally slow, 30-40MB/sec vs 110-120MB/sec. Are there any tools that can copy to the VMFS volume faster?

I think if I ever find myself in a hurry to restore some VM's in the future I'd restore them to my linux host and share them out via NFS to the VMware servers. Then I could SVmotion them to the VMFS volumes while their running.

Anybody else have any hints/tricks to improve the restore performance of Veeam?

Gostev
SVP, Product Management
Posts: 24092
Liked: 3278 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Veeam Restore Speed

Post by Gostev » Mar 29, 2010 9:59 pm

I don't currently believe ESX COS is a bottleneck, because our experiments on restoring directly to SAN do not show significant improvement either. I currently believe that the real reason is VMFS design around how it handles writes.

As for tips and tricks, I have heard a few times already that having battery backed cache improves the speed quite significantly.

JLaaij
Novice
Posts: 8
Liked: never
Joined: Mar 09, 2010 8:58 am
Full Name: Jaap Laaij
Contact:

Re: Veeam Restore Speed

Post by JLaaij » Mar 30, 2010 6:29 am

Hi Gostev,

"As for tips and tricks, I have heard a few times already that having battery backed cache improves the speed quite significantly."

Using ESXi v4.0.x with latest patches etc.

I 'm running Starwind HA as SAN.
HP 150G6, 5Gb memory( test)
Raid 1 on 2x WD RE3 500Gb disks
Raid 0 on 4x WD RE3 500Gb disks

Starwind has a caching option.
Have you ever tested speed with Starwind with caching enabled. Or heard results about it?

Greetz Jaap

Gostev
SVP, Product Management
Posts: 24092
Liked: 3278 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Veeam Restore Speed

Post by Gostev » Mar 30, 2010 10:34 am

Jaap, the battery backed cache I am talking about relates to I/O controller in the ESX servers, not the storage side caching.

fredbloggs
Service Provider
Posts: 47
Liked: never
Joined: Mar 18, 2009 1:05 am
Contact:

Re: Veeam Restore Speed

Post by fredbloggs » Mar 30, 2010 8:33 pm

tsightler wrote:I think if I ever find myself in a hurry to restore some VM's in the future I'd restore them to my linux host and share them out via NFS to the VMware servers. Then I could SVmotion them to the VMFS volumes while their running.
?
Just as a query what performance do you get if you run a Linux host as a VM on the same storage, that way you may be able to find a little more on what performance the SAN is offering to confirm whether it's vmfs. Imagine you'd be limited by the 1GB LUN connection to the SAN.

I'm interested, have a SAN from the same vendor as you.

stephaneb
Influencer
Posts: 21
Liked: never
Joined: Feb 04, 2010 9:19 am
Full Name: Stephane Bourdeaud
Contact:

Re: Veeam Restore Speed

Post by stephaneb » Mar 31, 2010 10:07 am

FYI, we get about 150 MB/sec during LAN restores on our infrastructure.

Backed up data is on low perf SAN LUNs (SVC with sata storage backend) and restored to high perf SAN LUNs (SVC with fiber 15k disk storage backend), Veeam server OS is 2008 32bit, the Veeam server is using x2 Gbps ports grouped in a LDAP etherchannel team (useful only when restoring to multiple ESX servers).

Our ESX servers are running ESX 4U1 on IBM System x 3850 M2 hardware and have 2 dedicated active/standby Gbps adapters on the vSwitch with the Service Console port (which has no other port group).

tsightler
VP, Product Management
Posts: 5305
Liked: 2160 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam Restore Speed

Post by tsightler » Mar 31, 2010 1:02 pm

stephaneb wrote:FYI, we get about 150 MB/sec during LAN restores on our infrastructure.
Is that with large VM's with lots of compressed data? I get similar speeds on some VM's, mainly VM's that have lots of "unused" space or zero'd disk space, or data that's reasonably compressible. Also, are you running Veeam 4.1.1? We saw much better restore speeds with previous versions, but those versions had issues restoring Linux volumes without corruption.

I guess my point here is that our storage is obviously capable of much better restore speeds, the restore to the Linux box is the same disk as the restore to the ESX console, so something else has to be the factor. I'll try restoring to a Linux VM later today, that would make the restore to the actual same volume.

tsightler
VP, Product Management
Posts: 5305
Liked: 2160 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam Restore Speed

Post by tsightler » Mar 31, 2010 2:36 pm

OK, here are my results for restoring to the various platforms:

Restore to Linux Physical Host: 213MB/sec -- 1min 36sec

Restore to Linux VM: 155MB/sec -- 2min 12sec

Restore to ESX Console: 41MB/sec -- 8min 19sec

The restore to the Linux VM and the ESX Console were both to the very same VMFS LUN. Obviously VMFS is optimized for performance for VMDK operations (most operations within a VMDK file don't require an iSCSI reservation), but I didn't realize that the overhead when writing via the service console was so much.

I can envision a feature here, when Veeam is performing a restore, it could create the VMDK files, and then run a small appliance that mounts the empty images and lays the blocks down from within this "restore VM".

tsightler
VP, Product Management
Posts: 5305
Liked: 2160 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam Restore Speed

Post by tsightler » Mar 31, 2010 3:48 pm

BTW, I forgot to include that the above numbers were for restoring a 20GB VM with only about 7.5GB of actual data, the final 12.5GB restore in just a few seconds no matter the platform.

I'm currently running a test restore of a 350GB VMDK file that previous took 4 hours to restore via the ESX console (it's very full of lots of good data). Currently the restore to the Linux server is estimated to take 45mins and I'm at 135MB/sec and still climbing.

stephaneb
Influencer
Posts: 21
Liked: never
Joined: Feb 04, 2010 9:19 am
Full Name: Stephane Bourdeaud
Contact:

Re: Veeam Restore Speed

Post by stephaneb » Apr 01, 2010 4:58 am

I'll be running a test restore of a 200 GB VM full of data later today and will post the results, that way we'll see if I can replicate the poor ESX console performance you are getting.

stephaneb
Influencer
Posts: 21
Liked: never
Joined: Feb 04, 2010 9:19 am
Full Name: Stephane Bourdeaud
Contact:

Re: Veeam Restore Speed

Post by stephaneb » Apr 02, 2010 8:15 am

Ok, I have run that test and it averaged at about 67MB/sec which matched the bytes sent/sec I could see in the performance monitor on my Veeam box.
However, the 240 GB I did back up were de-duped @11% and compressed @60%. The COS was running at near 85% CPU and a top revealed it was busy with the Veeamagent processes.
The data was actually written at >100MB onto the VMFS volume, so overall, I believe it was the de-compression in the COS that was the bottleneck.

When you say you are restoring the data to a Linux box or a VM, what do you mean exactly? What do you do in the Veeam B&R console?

When I have the chance, I will run a backup job with dedup & compression turned off, then we'll see if going thru the COS to write the data really is a bottleneck.
It may also show that letting the Veeam server handle the decompression and sending the data as is over the LAN link may be a better strategy when restoring large amount of heavily compressed data.

vbussiro
Enthusiast
Posts: 64
Liked: never
Joined: Feb 18, 2009 10:05 pm
Contact:

Re: Veeam Restore Speed

Post by vbussiro » Apr 02, 2010 9:48 am

You might achieve this scenario forcing "agentless mode" connecting to the esx for restore (properties of esx host into VB&R), thus forcing veeam server to send data uncompressed. Am i right ?

Gostev
SVP, Product Management
Posts: 24092
Liked: 3278 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Veeam Restore Speed

Post by Gostev » Apr 02, 2010 10:49 am

@Stephane CPU only becomes a bottleneck when it's load is 100%, I don't believe in your case there are issues with CPU. In anyway decompression is not CPU intensive operation, unlike compression.

@vbussiro That's right, but in case of "agentless mode" the restore will be done through VMware file management API, and this is typically much slower than what you can get with agent mode.

tsightler
VP, Product Management
Posts: 5305
Liked: 2160 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam Restore Speed

Post by tsightler » Apr 02, 2010 1:47 pm

When I say "restored the data to a linux server" it's pretty simple, we have a lot of Linux servers, both VM's, and a few physical machines. Veeam backup supports adding Linux systems as targets just like adding ESX servers. Once these servers are added to the list, you can restore the VM files directory to them. I setup one of our physical Linux servers as an NFS server using some of our low-cost, tier 2 storage, and configured a couple of our ESX servers to use the NFS destination as a datastore (Veeam support NFS datastores and the NFS server in RHEL5 U2 is even a certified storage option). I then ran the restore with Veeam normally, but told it to restore the files to the linux server, then used the command-line "vmware-cmd" to register the restored VM to vCenter and fire it up.

For the test of restoring a to a VM, I just picked one of our virtual linux systems that had enough free space, added it as a target to the Veeam console, and told the system to run the restore of the VM files. This VM was running on the very same ESX host, and very same VMFS LUN on which I restored using the COS.

I have an idea on how to test if the COS is the problem. For the NFS restore I restored directly to the linux host that was hosting the datastore, however, since this NFS share is also now a datastore mounted on two of my ESX servers, it's also a target for a "normal" restore via the ESX COS. In other words, I can restore the the ESX server, but either pick the VMFS datastore, or the NFS datastore. This would use the exact same process, but simply write to two different filesystems. If the restore speeds are the same, then the bottleneck is likely the COS, if the restore speeds are different, then the bottleneck is VMFS. I'm pretty sure VMFS has a lot of overhead when writing via the COS because it has to obtain a SCSI reservation for every write, something that normally doesn't have to happen with writes from within a VM. This is a lot of overhead for the storage systems, although I'm sure some storage handles this better than others.

I'll perform my restore to the NFS datastore soon, via the ESX COS soon and update the numbers above.

Also, I don't understand how you say that your array was transferring 100MB/sec but Veeam only reported 67MB/sec. For a 250GB VM that would be a lot of overhead. In our scenario Veeams performance reports seem pretty accurate. We typically see write speeds of 40-60MB/sec and Veeam reports average speeds within that range.

stephaneb
Influencer
Posts: 21
Liked: never
Joined: Feb 04, 2010 9:19 am
Full Name: Stephane Bourdeaud
Contact:

Re: Veeam Restore Speed

Post by stephaneb » Apr 02, 2010 3:29 pm

The difference between the actual disk write rate and the Veeam restore rate (which matched the NIC sent bytes/sec) is, as I said, I believe due to the fact that my backed up data was compressed at 60%+
If I understand correctly, when the Veeam agent is used, it sends the compressed data over the wire, then the Veeam agent decompresses it and writes it to disk, so it is expected to see the disk write rate > network transfer rate, no?

I am running a series of tests using the same data with & without dedup/compression and with & without the Veeam Agent, but so far I'm not seeing a big difference... I also average around 55 MB/sec, so I guess your hunch about VMFS being accessed thru the COS being the overhead is likely a good hunch : )
What I fail to see is why writing to the VMFS volume thru the COS requires SCSI reservation when doing it through a VM does not.

I thought SCSI reservations were only required when writing metadata (such as when creating a new file or expanding an existing one).

It is also a bit of a shame that Veeam restore jobs are not logged in the session screen, and that actual transfer rate is not differentiated from the aggregated rate (that would include the processing time used for dedup/decompression) as is done in other traditional backup products.

Post Reply

Who is online

Users browsing this forum: renedubs and 66 guests