Loving V6 so far!!

th83 · Post by **th83** » Dec 06, 2011 3:10 am this post

My experience so far...

After losing a lot of sleep over the weekend working out the kinks and fixing my mistakes, v6 is running great and I can't believe the speed increase!

Just a few examples:
Replicating a file server with 750GB of data, down to 25 minutes total (replicating the data disk only took 6:30)!
Replication job on my Exchange server w/ 400GB of mailbox data... 20 minutes
Backup job with 6 VMs ranging from 50-120GB, 35 minutes
Replication of a 60GB VM running Accounting software over a 20Mbps WAN... wait for it... 4 1/2 minutes!

The kicker here is I'm running 3-4 jobs in parallel now. My backup window has gone from 6+ hours to around 2 hours for nearly 3TB of VM data and I'm finally saturating my disk writes on the backup target. There are, of course, a few minor bugs, but I'm impressed with the quality for a first release after such major architecture changes.

Post by **Gostev** » Dec 06, 2011 11:28 am this post

Hi Tim, glad you are enjoying v6 and its quality - and thank you very much for taking time to write such a detailed feedback. I appreciate you including actual numbers as well - this kind of data provides invaluable information to other community members on what can be achieved in a very specific conditions.

topry · Post by **topry** » Dec 06, 2011 1:49 pm this post

Can you advise how many backup proxies you are using and if they are VM or standalone?
My initial backup tests are showing comparable times, but I have only one proxy which is the Veeam VM and our SAN is sadly, iSCSI only.
I have yet to test replication.

ChrisL · Post by **ChrisL** » Dec 12, 2011 12:23 am this post

Those are some really impressive numbers!

I have to wonder what we are missing at our site. Many of our servers of roughly 80-120 GB take in the region of 20 minutes to complete EACH, and our two main homefolder storage servers (one 630GB and one 960GB total) each run well into 15-20+ HOURS to complete just an incremental rollback. Probably the main difference is that our SAN access is iSCSI and the job types, although Direct SAN and using CBT are reverse incrementals, but should being reverse-incrementals really make THAT much of a difference? We are backing up to local storage (RAID5) on the Veeam server but have to use reverse-incs to save space as the 5.3TB isn't enough to sustain the multiple synthetic fulls that we would need for the homefolder servers.

th83 · Post by **th83** » Dec 12, 2011 4:52 am this post

We're running off a 16-disk Equallogic RAID 5 over iSCSI as our primary storage, backing up to a 4 disk, 5.5TB RAID 5, using 3 backup proxies set up to use hotadd. Each individual backup is not necessarily running that much faster (though replicas are quite a bit faster), but since we can run 3-4 jobs in parallel now, the backup window has shrunk. It makes sense that if our backup window was 6 hours running one job after another, dividing the jobs up and running 3-4 at the same time would cut the backup window to 1/3, going from 6 hrs to 2hrs. We have had a couple backup windows hit 3 hours, but generally it's about 2.

I've found that in our environment, hotadd worked a *lot* better than direct SAN access. This may be just that the VMware iSCSI initiator is better than the one built in to Windows... not really sure and didn't have the time to troubleshoot it. Also, defragging our large file server on a monthly basis has helped a lot with incremental backup speed, though the backup run after the first defrag took forever.

Post by **Gostev** » Dec 12, 2011 12:54 pm this post

@Chris, v6 bottleneck analysis really helps to reliably identify the "weak spot" in your backup infrastructure, use it!

Post by **tsightler** » Dec 12, 2011 2:29 pm this post

I was going to say the same thing. The bottleneck statistics should give you a good indication of the areas you need to look at. Reverse incremental does make a huge difference based on your change rate, especially if your target storage is I/O limited as it requires 3 times the random I/O compared to straight incremental.

th83 · Post by **th83** » Dec 12, 2011 3:49 pm this post

Our bottleneck is the target, which has now increased our replication job time because 'Applying retention policy' takes a while (the first several jobs did not do this step because there were not enough restore points yet). Our Exchange server replica takes 25 minutes to replicate, then 45+ minutes to 'Apply retention policy' (commit restore point snapshot). We may have to add more disks to our backup target to increase IOPS performance. This seems to be far less efficient than the previous method, writing all the data in one pass, then committing the data from the oldest pass into the VMDK after the job is more than doubling the total disk usage during the replica process. Still love the new features, but that is kind of a let-down.

Post by **tsightler** » Dec 12, 2011 4:28 pm this post

th83 wrote:This seems to be far less efficient than the previous method, writing all the data in one pass, then committing the data from the oldest pass into the VMDK after the job is more than doubling the total disk usage during the replica process.

Just to clarify, unless you are only keeping 1 restore point it's not likely to be the transfer and commit process that is causing double the disk space. The calculation for required disk space for replicas is effectively <number of replicas to keep>+1. If you have 7 restore points, then during replication there will be an 8th point "in progress" and, at completion, the earliest snapshot would be committed, thus, to keep 7 restore points, you'd need enough space for 7 + 1 restore points during active replication.

The increase in space is due to the fact that V6 keeps replica restore points as native VMware snapshots. Previous versions used a proprietary format that was compressed, but had many disadvantages, especially with ESXi architecture. The V6 method has huge advantages in performance and management simplicity, but does require significantly more space since the restore points are not compressed but are simply native snapshot images.

th83 · Post by **th83** » Dec 12, 2011 10:22 pm this post

tsightler wrote: Just to clarify, unless you are only keeping 1 restore point it's not likely to be the transfer and commit process that is causing double the disk space. The calculation for required disk space for replicas is effectively <number of replicas to keep>+1. If you have 7 restore points, then during replication there will be an 8th point "in progress" and, at completion, the earliest snapshot would be committed, thus, to keep 7 restore points, you'd need enough space for 7 + 1 restore points during active replication.

The increase in space is due to the fact that V6 keeps replica restore points as native VMware snapshots. Previous versions used a proprietary format that was compressed, but had many disadvantages, especially with ESXi architecture. The V6 method has huge advantages in performance and management simplicity, but does require significantly more space since the restore points are not compressed but are simply native snapshot images.

I guess I wrote that wrong. I was referring to disk writes and performance, not space usage. It is using a lot more writes to commit the snapshot than it seemed to when using the old school method, so our backup target is getting bogged down during the process.

th83 · Post by **th83** » Dec 16, 2011 1:45 am this post

I figured out the issue causing the Commit Snapshot during the process of applying the retention policy to run so slowly. I started investigating the performance of the target iSCSI storage where the replicas are stored, and found that random write performance was terrible for some reason. I had changed the multipathing for that target to Round Robin in vMware recently, so thought that may be the issue. Changed it back to Fixed and manually set the preferred path for the hosts to balance the load across NICs manually. The following night, the snapshot commit during the 'Applying retention policy' step took 7 minutes instead of the 30-50 minutes it had been taking.

msidnam · Post by **msidnam** » Aug 16, 2012 12:29 pm this post

th83 wrote:I figured out the issue causing the Commit Snapshot during the process of applying the retention policy to run so slowly. I started investigating the performance of the target iSCSI storage where the replicas are stored, and found that random write performance was terrible for some reason. I had changed the multipathing for that target to Round Robin in vMware recently, so thought that may be the issue. Changed it back to Fixed and manually set the preferred path for the hosts to balance the load across NICs manually. The following night, the snapshot commit during the 'Applying retention policy' step took 7 minutes instead of the 30-50 minutes it had been taking.

hi,
when you say you manually set the preferred path for the hosts to balance the load across NICs manually, how did you do that?

thank you

R&D Forums

Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Re: Loving V6 so far!!

Who is online