Planning my deployement? Any advice, some questions.

davidb1234 · Post by **davidb1234** » Feb 24, 2012 3:34 pm this post

I am planning a veeam deployment for my company by myself. We currently use backup exec and hate it and can't wait to see what Veeam can do for us. Can you guys look over my config and let me know if I am missing anything and maybe help with my questions below?

Backup Server
-12 Core HP DL380 G7
-16GB RAM
-Windows 2008 R2
-Dual FC Connections across dual paths to SAN
-4 Port NIC Teamed
-Flash Backed 1GB Disk Controller with 12 x 2TB 7.2K SATA in RAID6 in an MSA60 storage enclosure configured in a RAID 6 for backup storage and NFS for Instant Recovery(the idea is to use the backup server and this NFS share to fire up our environment if the main SAN has a hardware problem and needs to be repaired)
-This server will have all VEEAM components installed on it. no seperate proxy or backup server is planned at main site
-Veeam 6 B&R Enterprise

SAN
HP P2000 G3 dual controll FC SAN /w 12 x 600GB 15K SAS in RAID10 - HIPERF ESX STORAGE
HP P2000 G3 enclosure attached to main P2000 /w 12 x 600GB 15K SAS in RAID 6 - LOWPERF ESX STORAGE
Dual FC Switches for dual patching to everything
3 x ESX Hosts
30 VMs
4.5TB of provisioned VMWARE STORAGE(probably around 2.5TB Used Space)-SQL,Exchange,App servers, IIS, dev, test, etc

DR HOTSITE
HP MSA2312FC G2 dual controller SAN /w 12 x 600GB 15K SAS in RAID6
DL360G5 Quad Core 6GB RAM as Veeam proxy server at DR HOTSITE to recieve backups
ESX HOST

My main concerns are the following:
-Will I be able to fire up instant recovery to load all my VMs on the 12 x 2TB 7.2K SATA disks in a RAID6 on the veeam backup server or will it not be able to handle it in case of a SAN hardware failure? I am concerned that 12 2TB 7.2K SATA disks in a RAID 6 will not provide enough performance to run the VMs even if its just for a day.
-Will I be able to run all of the Veeam components on the single beefy physical server like this?
-Am I missing anything here?
-We are planning to do reverse incremental so we always have a full backup available from previous night. 1 Real full per month, synthetic fulls once per week on weekend.
-Planning to replicate all VMs to DR Hotsite 24/7

Post by **Jfmoots** » Feb 25, 2012 2:09 pm this post

Very nice setup. What's your pipe between your production and DR sites? (Sorry if I missed it.)

To answer your questions...

- I don't, as a rule, count on being being able to "Instant Recover" all of my VMs at once. Instant Recovery is not a replacement for Replication. There are limitations to running VMs from your backups and even though you have a very well designed setup here, I would always stray away from planning my DR strategy based on Instant Recovering "all" your VMs.

- That server should be PLENTY to run all of your Veeam components.

- The only thing I see you missing is your connection speed between Production and DR. Also, I'm assuming by DR Hotsite you mean you're going to land Replicas there. If that's the case, I love the design. Local backups for quick recovery, off-site replicas for DR. PERFECT!

- There's no "synthetic full" with Reverse Incremental. You'll have one full (your most recent backup). Your restore points will all be behind it as reverse incrementals. It's great for disk space, but a little rough on i/o during the job.

- Again, I love that you're mixing Backup AND Replication. That's a proper design. You say you plan to replicate 24/7. You mean hourly?

One final question. You say you have 3 ESX hosts in production and one in DR. You're going to land all your replica's on the one ESX host in DR? Does it have enough horsepower to run all 30 of your VMs?

davidb1234 · Post by **davidb1234** » Feb 26, 2012 8:47 pm this post

Thank you very much for your reply!

So even if we were using some higher performance storage and maybe even loading it on a seperate SAN enclosure attached to the same fabric we still shouldn't really plan to running our mission critical vms using instant recovery all at the same time? (right now we are just using RAID 6 cheap sata disks). I just want to be clear on this point so I can relay this information to management(the fact that we would really need to fully flip over to our replicas at DR hotsite in the case of a major san enclosure failure rather than a huge severe issue like someone cuts the fiber or flood, etc. The feature sounded amazing but did not realize that this would be not recommended for running all backed up VMs at once. It just sounded to good to be true to have your backup SAN attached to the same fabric and be able to spin these up and vmotion them back over to prod when the main SAN enclosure was repaired. Luckily we have never had an enclosure issue or two controllers fail at once but we have come close.

Our plan is to have 2 ESX hosts at our production site each with 12 cores and 192GB RAM(DL360 G7s).

We would have 1 ESX host at our DR site with 8 cores and 192GB RAM DL360 G6. We would loose a bunch of CPU power but we would also probably not worry about bringing up our dev and staging servers and just bring up mission critical stuff. Alternatively after testing we could add a 2nd ESX host at our DR site.

We have a 50MB connection to the DR hotsite. We are thinking about adding a 2nd 100MB connection for redundancy and using that faster one 24/7. Is this what you see most people having? Does this seem lacking or too much?

We would replicate every hour or sooner depending on how it works out and what we can do. We planned to test and tweak this.

Post by **tsightler** » Feb 26, 2012 9:06 pm this post

davidb1234 wrote:So even if we were using some higher performance storage and maybe even loading it on a seperate SAN enclosure attached to the same fabric we still shouldn't really plan to running our mission critical vms using instant recovery all at the same time? (right now we are just using RAID 6 cheap sata disks). I just want to be clear on this point so I can relay this information to management(the fact that we would really need to fully flip over to our replicas at DR hotsite in the case of a major san enclosure failure rather than a huge severe issue like someone cuts the fiber or flood, etc. The feature sounded amazing but did not realize that this would be not recommended for running all backed up VMs at once. It just sounded to good to be true to have your backup SAN attached to the same fabric and be able to spin these up and vmotion them back over to prod when the main SAN enclosure was repaired. Luckily we have never had an enclosure issue or two controllers fail at once but we have come close.

As James stated, instant recovery is definitely not a full DR solution, and it's not intended to be. It's a great option for recovering a non-functioning VM quickly, and it works well for situations for which it is designed, but you have to remember that you are running from backup storage, and from compressed and deduplicated backup files. The performance is not going to be comparable to a "real" VM, especially with regards to I/O, and the more VMs you have.

The best way to think of Instant Restore is as a "spare tire", perhaps like that little "donught" spare common on most cars. It's designed to get you out of a bad situation, but it's not a long term solution, and if you have four flat tires, you'll need more help than a spare tire can provide.

That doesn't mean that Instant Restore can be a part of you total DR solution. I've been unlucky enough to experience major SAN failure and lost 20+ VMs pretty much instantly. We were able to failover the critical VMs, restore many of the smaller ones, but our large fileservers were going to take too long, and we didn't even have the space right away as we were waiting on the SAN vendor to repair the catastrophically failed SAN. We were able to leverage Veeam instant restore for a couple of VMs and get through the business day with minimal impact. Instant restore performed reasonably well, but it would have been unrealistic to use it for 20+ servers.

Post by **Jfmoots** » Feb 26, 2012 9:21 pm this post

You're very welcome!

Instant VM Recovery is a fantastic feature and it has a place and purpose. It's a life saver, but not a replacement for a true DR plan. The backups are usually kept on slower storage, they're stored compressed, and they're deduped. The VM is also being presented and "rehydrated" through your limited Veeam backup server and that's where the BIG bottleneck will be. Your setup will allow you to be able to run quite a few at once, but I couldn't recommend that you plan to run them all.

Replica's, on the other hand, are stored in their native format (not deduped and compressed) and while they don't have to live on storage nearly as nice as your production site's storage, it's usually faster than your backup storage. There's no bottleneck between the Replica's and the ESX host other than the speed of the storage. Your DR ESX host will do the job just fine and that pipe is going to be great for your job traffic. Without knowing what all your servers do, I can't comment on it's ability to run your operations across that pipe.

Your connection to your DR site is admirable. I think it's going to do well. I see GB pipes now and then. I think you're on your way to a reliable and failure resistant setup.

Post by **tsightler** » Feb 26, 2012 9:42 pm this post

The biggest issue with replicating hourly is the impact that it has on your source VMs. Veeam leverages VMware snapshots for replication, so if you replicate 30 VMs every hour, that's 720 snapshots, and, more importantly, snapshot removals every day. This will have some impact on the source VM performance. This is the issue I see most commonly overlooked, the impact of those snapshots on the source. In most cases it's all about I/O, and you've got some pretty decent hardware so you'll probably be good, but I've seen people attempt to implement replication without any considering as to the impact this will have on their source storage capacity. With lightly loaded VMs you can get by with this, but for transactional VMs, this can be a bad thing, especially if the underlying storage is already near 50% of it's IOP capacity. The snapshots creation/deletion process will increase this significantly and can push a borderline system over the edge from a performance perspective.

Once again, I'm not saying you're in this state, it appears you have good, high speed, storage, and a reasonable spindle count for the number of VMs, but it's just things to think about when determining how often to replicate. The more often you replicate, the more I/O load is introduced on the source.

davidb1234 · Post by **davidb1234** » Feb 27, 2012 7:09 pm this post

Thank you very much for all of the information here. This has given me a bunch of things to think about. We will plan to start slow and test as we turn things up and start replicating. We may have to replicate less often for our more highly transactional servers.

Post by **tsightler** » Feb 27, 2012 9:57 pm this post

Yes, this is the best approach. So many times I've seen customers test on a small scale, a few servers, and it works well enough that they immediately jump to full deployment, only to find that it does not scale in a linear fashion. Start slow, monitor and understand the impact, and grow slowly, and you will be likely to have great success.

R&D Forums

Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Re: Planning my deployement? Any advice, some questions.

Who is online