Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

ctchang · Post by **ctchang** » Oct 22, 2010 1:55 am this post

Test 1: 1st time Full Backup

Backup a W2K8R2 VM on EQL SAN to Local harddisk on Veeam backup server, VM size is 20GB, finished in 2-3 mins (I can't remember exactly) for the first time full backup, final vbk result is 3.65GB, using LAN mode (ie, backup via Service Console Nic), service console NIC is teamed at 2Gbps, I saw av speed is about 300-400Mbps, but the result showing in Veeam Stats is 105MB/s that's 840Mbps. and one thing I noticed, it's very slow at the beginning, then accelerated EXPONENTIALLY towards the end:) , how come?

Test 2: 2nd time Incremental Backup
Took about 3 seconds to finish cos there is no data change to the above, the incremental vbk is about 50MB, and Veeam Stats saying the speed is 1GB/s!!! What? Are these the same high numbers showing in v4 Fast Incredible Speed Backup Whitepaper? (it indicates 5-6GB/s at the last page)

I also notice several things
1. It takes 15-30 seconds to initialize a job, then it gradually start to backup the VM, is it normal? It says something about no CBT at the beginning, then it starts to use CBT, the backup job seems it's trying to initialize many steps behind the scene, that's why it took more than av. 20 seconds to start. is this normal?

2. My backup server is the latest PE610 with 1 Xeon 5640, 12GB, I saw the average CPU usage is 50-60% across all 12 cores, that's HUGE! Just for that one job, come on, the VM is a little 5GB VM (20GB size though), I wonder what if I have 100 VM going on the same time? Then I need 1200 cores for that is it? or the backup server is doing its best to compress and de-dupe, so it took so much CPU, THAT'S WHY THE BACKUP SPEED IS ALSO VERY FAST. If I have 100VM going, then CPU will probably going to 100%, but the loading will eventually spread/load balanced among those 100 jobs, so that little 5GB VM will take probably 10 times more time to finish the backup right?

So what's the best practice? Not to put too many backup jobs to start at the same time? Spread out the jobs evenly say 1AM 10VM, 2AM 10 VM, 3AM 20VM, etc?

Thanks.

Post by **tsightler** » Oct 22, 2010 2:23 am this post

Since your VM was 20GB, but the VBK was only 3.65GB, then that would pretty much indicate that not all 20GB of your VMDK is actually in use. Veeam will skip areas of the VMDK that haven't been used yet, and is also very fast at reading zero'd blocks so it's pretty typical for the backup to use more time as it backs up the portion of the VMDK that actually contains data and the flies past the later parts that are just empty space.

I've never seen a backup job actually finish in 3 seconds, you can barely talk to vCenter in 5-10 seconds. There's a lot of stuff that has to happen, communications with vCenter, vStorage API setup, VSS agent freeze, snapshot created, on and on. Typically there's 30-40 seconds of job setup for each VM. That's why a backup that pretty much has no changes might still only show 1GB/sec. If your VM is 20GB, and the jobs takes 20-30 seconds, that's around 1GB/sec for the entire job. Veeam "throughput" is always based on the number of GB's of VMDK space processed for the entire job. It's not an actual representation of the transfer rate.

Veeam is designed to make use of modern processors and it's generally recommended to run only a few jobs at once at most. We actually run 4 jobs simultaneously on our hardware and find that works pretty well, but that's pretty much the top. Dedupe and high compression are very costly when transferring data at high rates of speed. Normally you put lots of VM's into a single job. We have ~60 VM's and only have 4 "backup" jobs, but we could probably do them all in two jobs (two different backup targets), however, we like to split our Windows and Linux systems into two separate jobs so that's why we have 4 jobs.

Best practice is to create a small number of jobs that contain a large number of VM's. Dedupe is most useful this way (dedupe is currently limited to a single job). Unless you have some other requirement you certainly don't want a single job per VM.

ctchang · Post by **ctchang** » Oct 22, 2010 3:29 am this post

Thank you very much for sharing your knowledge. I found they are very very useful!

actually it was my 1st time to use Veeam Backup.

My fault also, I think it should be something around 30 seconds, but it's definitely very very fast for the 2nd incremental, that's not even via SAN, just a normal LAN backup, I will study the guide later and do a SAN backup.

Post by **mplep** » Oct 22, 2010 4:15 am this post

Hi,

I don't want to hijack this thread but my questions seems relevent so I hope you don't me joining it.

If you are using replication does the single job recommendation still apply as dedupe is not really relevent? I've noticed that a certain amount of pre-processing occurs which takes some time up with a single job. Whilst monitoring my backup job (first and subsequent passes) I see Veeam perform tasks like "checking license, validating task, initializing, creating and removing snaphots, mounting disks for VA" etc for each VM which when totalled up (28 VM's currently) adds quite a lot of time to the job.

Target space and ESXi Hosts on the target side are not a problem hence my interest in moving towards replication. So would multiple replication jobs in my case be better?

I'm also thinking about multiple jobs giving me better control of indvidual VM's or groups of them. With one job it's either a global backup or nothing. At least with multiples I can choose different schedules with more or less frequency or take manual backups with more control.

Thanks

Mark

Post by **Gostev** » Oct 22, 2010 8:06 am this post

Hi Mark, no deduplication does not apply to replication, since all data is written on target in the native (uncompressed) format, as normal VMDKs. Feel free to setup multiple jobs for replication.

Tom, thank you for thorough answer above.

ctchang, don't worry about CPU load much, yes full backup has to process a lot of data coming in at the great speed, which is why the load is so high. Try compressing large file on your desktop, and you will see CPU usage going through the roof, this is expected - compression takes a lot of CPU resource - and Veeam also does dedupe on top of that. We do provide LOW compression setting specifically optimized for low CPU usage, but it does not really make any sense to use in most scenarios... CPUs nowadays are powerful and cheap, but storage is expensive - good compression and dedupe is going to save you A LOT of money.

Also, you wil find that high load is specific to full backup, which you only need to perform once with Veeam - all consequent passes are forever-incremental, very little data to process, and so much lower load on CPU comparing to full backup even in case of multiple jobs.

Running way too many concurrent jobs is not recommended because source and target storage congestion. We recommend running 3-4 jos at once maximum (just like Tom said), this will typically be more than enough to reach your primary bottleneck, whatever it is.

Do not worry if you get 100% CPU load from multiple jobs, it does not mean any specific job suffers. What you see is effect of correct implementation of conveyor backup engine. Effectively our engine self-adjusts to the slowest element of your backup chain, whether it is production storage throughtput, source storage connection thoroughtput, Veeam Backup server data processing performance, target storage connection thoroughtput, or target storage speed. In your example below, the fact that CPU usage is not 100% means that your bottleneck is somewhere else right now. I imagine in your case it is LAN speed, so if you plan to stick with Network mode but want to further increase your backup speed, you may consider adding additional dedicated LAN and multipath. Although of course preferred way of doing backups from physical backup server is by using direct SAN access job processing mode, which provides LAN-free backup without affecting your ESX(i) hosts or LAN.

ctchang · Post by **ctchang** » Oct 22, 2010 8:54 am this post

Thanks Anton, I definitely need to play with direct SAN mode on EQL and try replication later to an offsite server.

Btw, I read somewhere saying automount is already TURNED OFF in v5 for direct SAN mode,

Then what about "automount scrub"

automount disable
automount scrub
exit

Btw, I just type those from cmd right? It will make Windows permanently set to those setting? What about if I plugin a USB KB or Mouse, will they work? I understand USB stick wont' work, need to manually add them in Disk Management.

Thanks,
Jack

Post by **Gostev** » Oct 22, 2010 9:00 am this post

Automount is disabled for storage devices only, and do not affect KB, mouse or webcam

do not worry about issuing any commands manually, Veeam Backup setup already performed all steps necessary. Thank you!

ctchang · Post by **ctchang** » Oct 22, 2010 9:04 am this post

Gostev wrote:Automount is disabled for storage devices only, and do not affect KB, mouse or webcam do not worry about issuing any commands manually, Veeam Backup setup already performed all steps necessary. Thank you!

That's cool, it really saved people from forgetting to issue those IMPORTANT COMMANDS and ending losing the whole lun.

Now, who else say Veeam isn't improving?

Post by **Gostev** » Oct 22, 2010 9:36 am this post

I have not heard anyone saying that in past 3 years

ctchang · Post by **ctchang** » Oct 22, 2010 12:36 pm this post

I've finished testing a SAN mode, IT'S SOOOOO SLOW compare to LAN mode (took 2-3 mins for the 1st full backup)

4 of 4 files processed
Total VM size: 20.00 GB
Processed size: 20.00 GB
Processing rate: 16 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 10/22/2010 7:49:31 PM
End time: 10/22/2010 8:10:13 PM
Duration: 0:20:41

The final result is 3.65GB same as before.

Any hint on that? Why SAN mode took so much longer than LAN mode? 10x more...it just doesn't make any sense to me.

ctchang · Post by **ctchang** » Oct 22, 2010 12:51 pm this post

Subquential is much faster, the following is adding anothre 900MB inside the VM.

4 of 4 files processed

Total VM size: 20.00 GB
Processed size: 20.00 GB
Processing rate: 129 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 10/22/2010 8:45:10 PM
End time: 10/22/2010 8:47:49 PM
Duration: 0:02:39

Post by **Vitaliy S.** » Oct 22, 2010 1:19 pm this post

Jack, please make sure your job didn't failover to a Network mode automatically, as it seems like the Network mode was actually used. Try configuring SAN only mode for this job to see if you have configured everything properly.

Post by **Gostev** » Oct 22, 2010 3:19 pm this post

Most typically terrible performance in case of direct SAN access is caused by outdated or misbehaving multipathing software, or actual multipathing settings.

ctchang · Post by **ctchang** » Oct 22, 2010 3:40 pm this post

Yes, I think so, I've installed EQL MPIO as well as Microsoft MPIO as EQL MPIO needs Microsoft MPIO to work in order to use more than 1Gbps connection.

So what's other EQL using? DO NOT INSTALL EQL HIT KIT's MPIO part? (but install all other EQL HIT kit components?) or do not install Microsoft MPIO as well?

Um...if we do not install any MPIO, then we left with only 1Gbps, shouldn't be this enough for backup?

Any other EQL user has successfully deploy with MPIO enabled and getting 4Gbps on PS6000?

Finally what's Veeam's suggestion? Do not enable any MPIO? using only 1Gbps is enough?

Thanks.

ctchang · Post by **ctchang** » Oct 22, 2010 4:17 pm this post

Just some quick update before going to bed:

I've removed EQL MPIO from the HIT Kit and only leave with 1Gbps instead of 2 1Gbps to EQL Array.

Now, the performance is MUCH MUCH BETTER, no more 1-2% TCP-Retransmit and 1st time full backup via SAN MODE took only 02:31
and this time Backup Server's CPU also looks right, into 80-90%.

4 of 4 files processed

Total VM size: 20.00 GB
Processed size: 20.00 GB
Processing rate: 136 MB/s
Backup mode: SAN/NBD with changed block tracking
Start time: 10/23/2010 12:12:14 AM
End time: 10/23/2010 12:14:45 AM
Duration: 0:02:31

10X LESS THE TIME!!! OMG!

So there must be a conflict between EQL MPIO DSM and MS MPIO DSM on Windows Server 2008 R2.

Well, I select NOT to USE EQL MPIO now as it produce more trouble than goods, I only lost some redundancy and SAN Speed is capped at 1Gpbs, but I am getting way better backup result and no more horrible TCP-Retransmit anymore in SAN HQ.

Finally, I would like to know if any other EQL users also using Window Server 2008 R2 with HIT Kit installed and EQL MPIO component installed as well, do you encounter any problem like mine?

Thanks,
Jack

ctchang · Post by **ctchang** » Oct 26, 2010 2:06 am this post

Anton,

I would like to report the CPU finally shoots up to 90% now (I double checked, it's not E5640, but CPU is Xeon E5620, 2.4Ghz, 4 Cores with HT, so shows 8 cores in Task Manager, all 8 CPUs shows 80-90%), but the 1Gbps SAN NIC (for Equallogic) is only ultilizing 600-700Mbps, (Hum...a single E5620 4Cores 2.4Ghz is a pretty powerful one than many of the previous generation Xeon/Opterons),

1. So the bottle neck is my CPU now? what's your opinion?

Anyway, here is the update report from SAN backup
------------------------------------------------------------

1st time FULL BACKUP:

5 of 5 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 332.90 GB
Processed size: 332.90 GB
Processing rate: 102 MB/s
Start time: 10/25/2010 2:48:44 AM
End time: 10/25/2010 3:44:15 AM
Duration: 0:55:30

2nd time Incremental Backup:
-----------------------------------

5 of 5 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 332.90 GB
Processed size: 332.90 GB
Processing rate: 251 MB/s
Start time: 10/26/2010 1:00:08 AM
End time: 10/26/2010 1:22:45 AM
Duration: 0:22:37

2. I am pretty happy about the result, but the processing speed is a bit slow for subsquential backup, that's for 5 VM only (Total 300GB), it took 1 hour for full and 20 mins rough for incremental, is it normal/avearge/below average? I mean what if I have 50 VMs, then it will take 10 hours for full and 200mins for incremental? what if I have 500VMs? hum....I see, that's why I need more than 1 Veeam Backup server to spread the loading right?

3. Also I found out during the backup window, when I check a VM's property, the Thin disk somehow shows Thick, does Veeam transform any Thin disk to Thick during backup and then transform them back to Thick again?

4. When Veeam takes a backup using SAN Mode, will there be a temporary snapshot created on the SAN volume? and then Veeam will instruct vCenter? to remove it after completing the backup window (or after sending the snap to Veeam backup server for dedup and compress)

5. I found the total THIN disk is 332.90 GB (all 5 VMs are Thin Prov., Reported non-Thin size is 723GB), the actual data usage withini 332.90 is only around 150GB. the 1st time full backup result is 92GB, so de-dupe and compression reducation rate is roughly 4 times, is this normal?
2nd incremental is about 19GB, since I don't know the actual changed size, I can't tell the reducation rate in any of the subquent backup

6. So I estimated I need 100GB (full) + 20GB each day x 30 days = 700GB space total that I need for keeping 30 days of backup, howerver Veeam backup job tells me

Source Data size is 723GB
Estimated Full Backup size is 362GB
Estimated required Space is 2.29TB

I think Veeam can only tell the reported non-Think volume size (ie, 723GB), so it is WRONGLY calculating the required space, in fact, if it's working correctly, then it should report something like.

Source Data size is 332.9GB
Estimated Full Backup size is 100GB
Estimated required Space is 700TB

ie, 100GB (full) + 20GB each day x 30 days = 700GB

Is it a bug or is it by design I need to calculate myself for the actual one? It would be nice if Veeam include the actual one somewhere later as I understand it's just an estimate before the 1st backup window. I mean if would be nice if we can have a button to click that tells us exactly/close estimation of how much space will be used at the end of the 30 days backup (ie, 700GB is what I want to see, so I can effectively arrange/add my backup harddisk space accordingly)

7. I intend to add a Windows Shared Drive for storing my backup images on another server, it has 6TB space, my question is will I expect Decrease in performance when copying data to the Shared Drive over 1Gbps link comparing storing data on local raid disk? The key is I don't know if Veeam will finish dedup+comp, then transfer the whole image to local/shared drive or will it create the dedup+comp backup bit by bit and save to local disk/shared folder bit by bit. I think local disk should be Much faster right? but if Veeam create the volume ONLY after dedup+comp, then I think there is no difference.

However, take an example from my 2nd incremental backup.

Total size of VMs to backup: 332.90 GB
Processed size: 332.90 GB
Processing rate: 251 MB/s

What does the 251MB/s really means? Does it mean the CPU processing speed for backup image MB/s or does it mean the local harddisk storing speed? I don't think it means the SAN transfer or Service console NIC transfer speed at all.

Thanks for answering my long question lists again.

Jack

Post by **Gostev** » Oct 26, 2010 12:28 pm this post

1. No, CPU is definitely NOT a bottleneck in your case. When it is, it stays at 100% (literally). I have seen it a lot lately backing up from FC8 SAN using server with single 1st generation quad core CPU. I think in your case bottleneck is iSCSI throughtput. You may try to implement some suggestions according to iSCSI performance tweaking guide (see FAQ), this may add about 10% more performance and nearly cap both 1Gb iSCSI LAN and push CPU load closer to 100%. I cannot promise that you will have the increase for sure, because even with your current performance there are other things starting to affect (e.g having good physical swithches).

2. Yes, this is normal for certain types of workloads. For example, Exchange and SQL generate a lot of transaction logs, which makes a lot of disk changes and results in large amounts of data that needs to be processed during the incremental pass. You will not see the same with domain controllers, for example - they will be very fast to backup because of little to no disk changes.

3. No, we do not do any transformations during backup. I suppose this is an issue with how vSphere client displays it.

4. Correct, all snapshot management is fully automated. Note that we use VMware snapshot that affects single VM (and not LUN-wide hardware snapshot of SAN).

5. Yes, this is normal. Looks like VMs are pretty similar.

6. Known bug with estimation thingy - it is not aware about thin-provisioning yet.

7. We do source side dedupe, meaning we compress and dedupe data before writing it to target storage. As a result, a few times less data is written than it is read. So, target storage is rarely a bottleneck - unless you start beating it with too many concurrent jobs.

8. Processing rate means total size of VM divided by total time it took to process the VM. This is affected by time it takes to talk to vCenter, freeze guest OS, create and remove snapshot, backup small VM files (configuration), and backup actual disks.

ctchang · Post by **ctchang** » Oct 27, 2010 2:00 am this post

Thanks again!

Btw, one more question regarding your answer to 1 & 8.

How do I know my real bottleneck is in NIC? As there is no stats showing in the completed backup session, the number Processing rate: 251 MB/s
doesn't show how much the iSCSI NIC performed, I can only get a rough idea of my iSCSI NIC by looking at Task Manager>Network, this is done manually for observing over a 5 mins period.

In additional, I am still a bit confused, as during the backup window, my E5620 CPU went up to 90%, and iSCSI NIC is about 60-70% utilization over 1Gbps, I thought the bottleneck is CPU, as I thought if I have more CPU, then I can get higher iSCSI NIC (SAN Mode) thoughput. and you disagree with me on that...so do you mean my high CPU usage is abnormal when my 1Gbps iSCSI NIC is only working at 60-70% ? In other words, do you mean normally for a 1Gbps iSCSI NIC to work at 60-70%, then CPU should be much slower, say at 30-40%? So there is something wrong? I thought the high CPU is due to Dedup + Compression, no?

Post by **Gostev** » Oct 27, 2010 11:48 am this post

Yes, if you real bottleneck was iSCSI NIC, you would see that at Task Manager>Network.

Not sure what are you asking in next paragraph, very confusing. So let me reiterate my points. Your CPU usage is normal. Your bottleneck right now is neither CPU nor iSCSI NIC. Actually, I just got pretty great idea - what if our conveyor optimization engine could report the current bottleneck? I will go check with the corresponding developer ASAP. This would be super-useful.

ctchang · Post by **ctchang** » Oct 27, 2010 2:23 pm this post

Yes, as currently there is no way for me to know where exactly is the problem is, I can only tell CPU is at 80-90% during backup windows, and the 1Gbps iSCSI LAN is at 60-70%, so my conclusion is CPU is my bottleneck.

However, in the above, you said "1. No, CPU is definitely NOT a bottleneck in your case. ", that's where my confusion coming from.

Post by **tsightler** » Oct 28, 2010 6:51 am this post

Have you tried running multiple jobs simultaneously? One thing that can cap backup speed is simply queue depth issues. Basically, only so many read request are issued at a time so if the storage latency is high enough (say >5 milliseconds or so, pretty common for iSCSI storage with SATA drives once they get busy) then there's not enough request being issued to keep the pipe full.

We've found that with our Equallogic storage we generally don't see more than 50-60MB/sec per job, but we can run two jobs (which gets two requests queues going) and they'll both still give right around 50-60MB/sec, which saturates a single, 1Gb iSCSI link (although we use Equallogic MPIO with Veeam without issues, but with Qlogic hardware HBA's). With two 1Gb iSCSI HBA's running four jobs, each job get around 35-40MB/sec, so around 140-160MB/sec total thoughput at which point our old backup server is pretty much running at 100% CPU, so we're not likely to get any faster.

That's why we run multiple jobs. Not only that, but when doing incremental backup, a high percentage of the time is spent simply prepping the guests OS, taking the snapshot, removing the snapshot, etc, etc. With Veeam V5 you get some more "job overhead" if you use the indexing feature since to system has to build the index file (can take quite some time on systems with large numbers of files) and then backup the zipped index via the VM tools interface. This time is all calculated in the final "MB/sec" for the job. That means that if you only have a single job running there will be lots of "down time" where no transfer is really occurring, especially with incremental backups because there's relatively little data transferred for most VM's compared to the amount of time spent taking and removing the snapshot. Multiple jobs help with this because, while one job may be "between VM's" handling it's housekeeping, the other job is likely to be transferring data.

To give you a complete idea, we have 4 jobs to backup approximately 40VM's, totaling around 7TB's. We could back them all up in a single job, and it would likely take 6-8 hours for our backups to complete, but with 4 jobs, we're typically done in 3-4 hours.

There are also other things to consider as well. If you're a 24x7 operation, you might not really want to saturate your production storage just to get backups done. This is admittedly less of an issue with CBT based incrementals, but used to be a big deal with ESX 3.5 and earlier, and full backups can still be impacting to your production storage. If I'm pushing 160MB/sec from one of my older SATA Equallogic arrays, it's I/O latency will shoot to 15-20ms or more, which severely impacts server performance on that system. Might not be an issue if your not a 24x7 shop and you have a backup window where you can hammer you storage as much as you want, but is certainly an issue for us. Obviously we have times that are more quiet than others, and our backup windows coincide with our "quiet" time, but we're a global manufacturer, so systems have to keep running and performance is important even during backups.

Finally, one thing often overlooked is the backup target. If you're pulling data at 60MB/sec, can you write the data that fast? Since Veeam is compressing and deduping on the fly, it can have a somewhat random write pattern even when it's running fulls, but reverse incrementals are especially hard on the target storage since they require a random read, random write, and sequential write for ever block that's backed up during an incremental. I see a lot of issue with people attempting to write to older NAS devices or 3-4 drive RAID arrays which might have decent throughput, but poor random access. This is not as much of an issue with fulls and the new forward incrementals in Veeam 5, but still has some impact.

Alexey D. · Post by **Alexey D.** » Oct 28, 2010 10:12 am this post

Wow, Tom!
Thanks for sharing your expertise again and again.

ctchang · Post by **ctchang** » Oct 28, 2010 10:33 am this post

Thank you very much Tom!

It's really detail and informative, I've made a print out and definitely need to go through it first.

ctchang · Post by **ctchang** » Oct 29, 2010 3:31 am this post

Gostev wrote:Yes, if you real bottleneck was iSCSI NIC, you would see that at Task Manager>Network.

Not sure what are you asking in next paragraph, very confusing. So let me reiterate my points. Your CPU usage is normal. Your bottleneck right now is neither CPU nor iSCSI NIC. Actually, I just got pretty great idea - what if our conveyor optimization engine could report the current bottleneck? I will go check with the corresponding developer ASAP. This would be super-useful.

Continue with my Q7.
I intend to add a Windows Shared Drive for storing my backup images on another server, it has 6TB space, my question is will I expect Decrease in performance when copying data to the Shared Drive over 1Gbps link comparing storing data on local raid disk? The key is I don't know if Veeam will finish dedup+comp, then transfer the whole image to local/shared drive or will it create the dedup+comp backup bit by bit and save to local disk/shared folder bit by bit. I think local disk should be Much faster right? but if Veeam create the volume ONLY after dedup+comp, then I think there is no difference.

Veeam Replied>We do source side dedupe, meaning we compress and dedupe data before writing it to target storage. As a result, a few times less data is written than it is read. So, target storage is rarely a bottleneck - unless you start beating it with too many concurrent jobs.

However I got differently answer after searching the forum posts.

As some say Veeam Backup does the dedupe and compress in real-time, bit by bit and not after finish the whole dedupe and compress on Veeam Backup Server, then send to Target Storage. (so it's different than what you mentioned, but I do trust your official answer)

Sorry, I forgot who posted the following, is you Tom again? I just copied it and then forgot where it is originated from.

Quote 1:"No doubt Veeam creates files a lot differently that most vendors. Veeam does not just create a sequential, compressed dump of the VMDK files.

Veeam's file format is effectively a custom database designed to store compressed blocks and their hashes for reasonably quick access. The hashes allow for dedupe (blocks with matching hashes are the same), and there's some added overhead to provide additional transactional safety so that you VBK file is generally recoverable after a crash. That means Veeam files have a storage I/O pattern more like a busy database than a traditional backup file dump."

(My comment: He means the dedupe and compress is more Random I/O Pattern, but not sequential alike)

Quote 2:
"Finally, one thing often overlooked is the backup target. If you're pulling data at 60MB/sec, can you write the data that fast? Since Veeam is compressing and deduping on the fly, it can have a somewhat random write pattern even when it's running fulls, but reverse incrementals are especially hard on the target storage since they require a random read, random write, and sequential write for ever block that's backed up during an incremental. I see a lot of issue with people attempting to write to older NAS devices or 3-4 drive RAID arrays which might have decent throughput, but poor random access. This is not as much of an issue with fulls and the new forward incrementals in Veeam 5, but still has some impact."

(My comment: This quote is by Tom in his Page 2 reply, again he's saying dedupe and compress is more Random I/O Pattern, but not sequential alike)

If Veeam is doing dedupe and compress bit by bit instead of finish the whole thing before sending to Target Storage, then my Target Storage will definitely a potential bottleneck as the I/O pattern is not a BIG SEQUENTIAL file, but rather many small I/O random movement.

Anton, could you kindly clarify if the backup process with dedupe and compress actually Random I/O please?

If yes, it will definitely hurt the Target Storage where the vbk and vik is storing to, and we need SAS 10K/15K disk, if it's only a single 7200rpm SATA 2TB disk or several Raid 5 7200rpm SATA 2TB disks, you will soon find out your bottleneck is at the target storage.

Question 9:
Is there any 10-100ms Downtime on VM during the moment of taking snapshot on VM? (hence some people asking for VM backup order)

Will I see something like 1 second ping drop within each VM on the job list due to snapshot initiation?

Question 10: (not really a question, but an interesting fining)

Finally, I discovered some hidden information when copying data from the post, if you copy the last BLANK line and it actually contains something like the followings...haha...

tsightler
Veeam MVP

Posts: 504
Joined: Fri Jun 05, 2009 8:57 pm
Full Name: Tom Sightler
Private message Top

ctchang · Post by **ctchang** » Oct 29, 2010 3:43 am this post

tsightler wrote:Have you tried running multiple jobs simultaneously? One thing that can cap backup speed is simply queue depth issues. Basically, only so many read request are issued at a time so if the storage latency is high enough (say >5 milliseconds or so, pretty common for iSCSI storage with SATA drives once they get busy) then there's not enough request being issued to keep the pipe full.

Tom,

Thank you very much again, I have finally went through your reply, the followings are my comments.

1. We use PS6000XV 15K SAS, our avaverage disk latency is about 5-10ms during backup windows, and we schedule the backup job to be performed between 1-3AM, where loading is minimum.

2. Yes, I see what you are trying to say the bottleneck is on CPU, but not on iSCSI bandwidth. and I somehow agree with you the 2nd bottle neck is the Target Storage location where fast spindle 10K/15K disks will really help as the backup pattern with dedupe and compress is Random I/O. (but need to be confirmed with Anton, as his previous reply saying the vbk/vik is sequential, Veeam Backup only finished the whole dedupe and compress before sending it to target storage), so if you are right about the backup pattern, does this mean we are better off to have some kind of RAID10/RAID50 with 10K/15K disks for Target Storage? Then it's going to be very expensive, I though originally, 7.2K 2TB Sata will do the job much cost effectively.

3. I read v5 FAQ, it said the indexing is almost instantanously, so it shouldn't take much time according to FAQ, but again, we never use indexing, as client always know which folder contains file they want to restore, so there is no need to add more loading on Veeam Server by adding MS Search Server v10 and adding indexing (or Catalog) to the backup job for our enviornment.

4. I understand what you mean by multiple jobs, it's like saying INSTEAD of using additional Veeam Backup Servers, you simply want to fully ultilize the one and only Veeam Backup server you have by adding more concurrent job to it, so saving you money and energy bill to add additional Veeam Backup servers to spread the loading.

However, I guess eventually you will need to add more Veeam Backup servers when your VM grows a bit more or if you want to really reduced the total backup window time, right? Besides, your Backup Server always runs at 100% CPU during that 3-4 hours, which is considered not very healthy by many system administrator ad they are educated to think a physical server should always have 20% room (ie, 80% is max for them).

ctchang · Post by **ctchang** » Oct 29, 2010 5:29 am this post

Anton,

I have some questions regarding replication this time.

FYI, I haven't used replication yet, and knew what it does.

I remembered you always recommend people to do replication besides backup. Here is my question

1. If I already have a scheduled backup job, then can't I just simply copy the vbk and vik over say via my VPN to home manually at month end?

2. If I schedule a replication to do the same as what the above backup job does (ie, backup VM to home), then I see I have two Duplicated backup running at the same time (one normal backup to local storage on Veeam Backup server, the other backup to remote-home using replication), is this normal for most of the users? Do I need to shift the schedule later than normal backup to local storage in order to avoid the VMFS locking thing? (ie, backup job locked VMFS and at the same time replication job also trying to lock the same VMFS)

Thanks,
Jack

Oct 29, 2010 3:46 pm

I thought about trying to quote and reply, but there's just so much stuff it's getting hard to follow, so here's my attempt to answer your questions.

First, regarding putting your backup on a shared drive and how it will impact performance. We actually use Linux hosts as our target storage because we find this to be an ideal way to spread the load (Veeam will push an agent out to the Linux host which offloads some of the dedupe/compression work), but we've done a LOT of testing with Veeam backing up using various methods, including local disks, linux targets, shared disks, etc. Here are my general conclusions:

1. Local Disks -- This is generally the fastest method assuming a powerful CPU and good memory and disk bandwidth. All I/O is performed locally, and data is deduped/compressed before being written to disk.

2. Linux targets -- This is almost and fast as Local disks, and if you backup to multiple target, can actually be faster than local disks since the CPU load is spread across the targets. In this scenario the Veeam server is effectively a vStorage API "proxy" reading data from the VMFS volumes, performing lightweight compression before sending the data across the wire to the Linux targets for the heavy lifting dedupe/compression.

3. Shared disks -- The slowest of the options, and can vary wildly based on the speed and quality of your target storage. In most of our test scenarios this method is only ~10% slower than local disks, so it's not huge. The reasons are multiple, a) writing to a shared disk at a high rate of speed consumes additional CPU time from the Veeam server leaving less time for dedupe/compression work, b) the added latency of writing over a network adds some overhead, c) many NAS servers seemed tuned more for multiple random access to small files (the typical file server scenario) rather than throughput for larger files.

The quality of the NAS plays a huge role for shared storage. If we take our Linux servers that we normally use for Veeam backup targets, and share the space out with Samba and allow the Veeam server to backup to them directly, performance is about 10% less, so in other words, for full backups where we might see 50MB/sec we'll see only 45MB/sec, and for incrementals where we might normally 240MB/sec, we only see 225MB/sec. That's not really all that much different. Now if we share out one of our Snap appliances, well, things are much worse, probably a 30-40% decrease in performance, but those things have minimal memory and overall poor throughput for single, random access as their latency is too high.

So I guess, to answer your question about whether you will see worse performance backing up to a shared disk, assuming the shared storage offers good performance, it's probably going to be slightly less, but not likely to be noticable.

Of course there are exceptions to every rule. Local disk are not automatically faster than shared disk. For example, if your local disk is a small, three drive, SATA RAID5 on a budget controller, while you shared disk is backed by a 16 spindle high performance storage array, then guess what? Right, your shared disk might actually be faster that local disk. So, that makes the answer to your question "it depends".

If Veeam is doing dedupe and compress bit by bit instead of finish the whole thing before sending to Target Storage, then my Target Storage will definitely a potential bottleneck as the I/O pattern is not a BIG SEQUENTIAL file, but rather many small I/O random movement.

Anton, could you kindly clarify if the backup process with dedupe and compress actually Random I/O please?

OK, I'm not Anton, but just to clarify my own statement, I'm not saying that a Veeam backup is some completely random I/O pattern. A full backup with Veeam is largely sequential, although there are some random elements where Veeam has to update metadata and other information to keep the backup transactionally consistent. Most backup format don't have similar information because they are not self-contained, recoverable backup files. But, a full backup is largely a sequential operation with a small percentage of random updates (I don't know the guts of the VBK file format, so my information is based solely on monitoring the behavior of my storage).

Now, for Veeam prior to V5, and still with V5 if you use "reverse incremental" backups, the I/O pattern for an incremental backup is quite random. With reverse incremental, each time Veeam backup up a new block it had to read the original block from the VBK file, write that block to the VBR file, then write the new block somewhere in the VBK file. That's two write operations and a read operation for every changed block. This is when I've seen the most issue with Veeam and poor performance, for example with our busy Exchange and SQL servers which have LOTS of changed blocks every day. With V4 a full backup of our Exchange server would complete at around 50MB/sec, but the incrementals were regularly only getting 30MB/sec because, to backup 100GB of changed blocks required moving 400GB of data. Even with our very fast backup storage it's was too much I/O.

This is less of an issue if you use Veeam 5's new "forward incremental". Now the changed blocks are simply read from the VMDK and written to a new VBI file. Performance of our Exchange and SQL backups have improved dramatically basically because Veeam is moving rought 1/3rd of the data that it was moving before and it's now largely a sequential write operation, not the read, write, write cycle of the previous versions.

If yes, it will definitely hurt the Target Storage where the vbk and vik is storing to, and we need SAS 10K/15K disk, if it's only a single 7200rpm SATA 2TB disk or several Raid 5 7200rpm SATA 2TB disks, you will soon find out your bottleneck is at the target storage.

Well, I certainly don't think your backup storage needs to be SAS 10K/15K disk, I never said that. We use fast disk storage for our backups, but they're pretty low cost. We have 20TB and 32TB iSCSI arrays that we built for around $7000/ea. They have 16 disks in a RAID6 configuration and thus provide significant I/O's and throughtput for a pretty bargin basement price (we used to spend more than that on tape media every year). We frontend the arrays with reasonably powered Linux boxes (another $1000/ea) with a lot of memory and use them as backup targets.

With think this setup offers a huge number of advantages over going the absolute cheapest route for our backup storage, as having good storage and putting a Linux server in front of them allows a lot of flexibility:

1.) As mentioned above, the Linux servers allow the load of Veeam to be spread across multiple servers since they can be added as backup targets
2.) We can also present the storage as an NFS share directly the the ESX hosts and use it as emergency storage in the event of a catastrophic failure of our primary storage (we've actually had to do this)
3.) Since the storage can be presented to the ESX hosts via NFS, we can restore VM's to the Linux box much faster than restoring them directly to the VMFS volumes (VMFS restores are quite slow).
4.) The storage is very fast, so multiple restores are possible simultaneous, important on that disaster day when you have to restore over 4TB's of VM's (yes, that happened to us too).
5.) The store performs well so SureBackup and Instant Restore performance is also quite good
6.) We can leverage Linux tools like DRDB for asynchronous replication of our backups to a remote site (only testing this now, but looks promising).

Obviously your environment may be different, smaller environments or environments with downtime are not critical may be able to get by with less, but backups are important, otherwise you wouldn't be doing them.

1. We use PS6000XV 15K SAS, our avaverage disk latency is about 5-10ms during backup windows, and we schedule the backup job to be performed between 1-3AM, where loading is minimum.

Is that read latency, or total average latency? It feels high for read latency from a relatively quiet 15K SAS array. For example, we have three arrays, PS6000E, which is SATA, and a couple of older PS3800X and PS3900XV's, and none of them go above 5ms latency during backups except maybe for very short periods. Average latency is around 3.5-4ms for the SAS arrays, and 4-5 for the SATA arrays. Now we do have some older PS400E and PS100E SATA arrays that give latency in the 5-10ms range, but they're 4 years old now.

Anyway, what's the average I/O size during the backup (you should be able to get that from SAN HQ)? My guess is you'll see something like 256KB/request if you're similar to our environment. So, 5ms per request is 200 request per second, and 256KB/request * 200 requests/sec = 51200KB/sec = ~50MB/sec. Obviously this has a lot of assumptions (that there's not much other I/O going on at the time, that there's not a high queue of requests, etc). You could monitor all that to get better numbers and perform the math, but my guess is that there's no more than 1 read request outstanding at time, although possibly 4-6 requests if your I/O size is smaller than mine.

3. I read v5 FAQ, it said the indexing is almost instantanously, so it shouldn't take much time according to FAQ, but again, we never use indexing, as client always know which folder contains file they want to restore, so there is no need to add more loading on Veeam Server by adding MS Search Server v10 and adding indexing (or Catalog) to the backup job for our enviornment.

Instant is in the eye of the beholder. It's pretty fast on servers with 10's of thousands of files, but isn't quite so instant on servers with 100's of thousands or millions of files. I haven't had time to fully investigate how this process works, but something has to crawl the filetree to build the index.

However, I guess eventually you will need to add more Veeam Backup servers when your VM grows a bit more or if you want to really reduced the total backup window time, right? Besides, your Backup Server always runs at 100% CPU during that 3-4 hours, which is considered not very healthy by many system administrator ad they are educated to think a physical server should always have 20% room (ie, 80% is max for them).

Well, I'm hoping Veeam might move to a more "distributed agent" based architecture where the Veeam server is effectively just the controller while other systems go about doing the work. That's been my suggestion if they really want to scale to larger customers. Having to manage multiple backups servers, which all themselves need a DR plan, is not the best.

ctchang · Post by **ctchang** » Oct 29, 2010 3:55 pm this post

Wow...you dont' want me to sleep (it's 23:53pm here in Hong Kong).

I am printing out this long reply and definitely exciting to read it, I love long and detail explanation.

Thanks again, I've learnt so much already in 2 weeks after using Veeam, this community is really warm, reminds me the old days at HELM (a windows control panel) forum.

R&D Forums

Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Re: Q: v5 Backup Speed, Job, CPU loading, Backup Time, etc

Who is online