Host-based backup of VMware vSphere VMs.
Post Reply
YouGotServered
Service Provider
Posts: 171
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Compute and Storage Ideas for DR Environment

Post by YouGotServered » 1 person likes this post

Hello,
We are looking for a storage solution to support a target replication environment. This is the typical "Here's what I want, and I want it for not enough money" post, so I'm not expecting to tick off all of my checkboxes here, but just want some thoughts.

TL;DR - we have about $40,000 and would like a scalable DR replication target, but starting today with 512GB RAM, 75TB storage, x2 12+ core Intel Gold / AMD equivalent CPUs. I have found a couple options but think I'm missing some.

Yes - I have done quite a bit of research into this myself before coming here. The problem is that the more research I do, the more options I find, and the less sure I am of which ones will work best for us. I'm hoping someone has been in a similar situation and can add their thoughts.

Here's our current situation:
• We currently have a non-scalable DR solution
> ○ Dell PowerEdge R430
> ○ x2 E5-2660 v4 14 core CPUs
> ○ 256GB RAM
> ○ The server has a bit of internal SSD storage that we use for a couple of utility VMs. Only 1.5TB of space there, so not much for incoming DR VMs.
> ○ One direct attached Dell MD3460, stuffed full of big drives in a RAID 10, giving us about 95TB usable storage (we currently use about 50TB).
• The MD3460 DAS sits around 2000 I/O, 1-2ms latency just taking incoming replication from all of our sites. It can spike up to about 4000 I/O, 3-4ms latency once a minute or so.
• Our production sites each average 3000-4000 I/O, 1-2ms latency on their SANs during the work day for regular use.
• When we perform test failovers, the servers are S L O W. I'm talking 30 minutes to boot up slow. This will never be acceptable performance.
> ○ Latency on the storage device (according to VMWare) spikes up to 10ms during failover tests.
> ○ I/O seems to cap out at 4000 - I've never seen it go higher, which leads me to think that's as much as we can pull from it. I've seen that you can expect roughly 50-100 I/O from each additional disk in a RAID 10 array. Of course, there are TONS of factors that go into that, but that would make sense here. 4000 I/O over 60 disks = about 67 I/O a piece.
• Due to COVID-19, our RDS farm footprint has grown heavily as people work remotely, so CPU and RAM resources are no longer sufficient. We can add more RAM to the host, but that doesn't account for the CPU and storage issues.

Ideally we would like something:
• VMWare based (pretty obvious, haha!).
• Simple to scale compute and / or storage individually.
• Capable of about 15,000 I/O, 1-2ms latency.
> ○ This way we can reasonably expect to failover one of our 3000-4000 I/O environments and still accept 2000-4000 I/O of incoming replication from other sites - with room to grow.
• Starting with 512GB RAM.
• Starting with x2 12+ core Intel Gold or equivalent AMD CPUs.
• Starting with 70TB storage.
• Mostly off the shelf - we don't have a ton of resources in house to custom-build something from the ground up. Linux and custom hardware just isn't our thing. We'd rather pay a bit more to get a "ready to go" solution with support.
• Partial to Dell, but open to other brands.
• And the kicker - around $40k to start.
> ○ Refurbished is fine (we buy tons of hardware from xByte) and it is like new and works flawlessly.

My current ideas and research:
• PowerEdge R740XD from Dell. Stuffed with 32 4TB SSDs, I could expect about 110TB or so usable (x3 RAID 5 groups). 512GB RAM, x2 Intel CPUs - around $40,000.
> ○ Pros:
>> • Affordable.
>> • Local SSD based storage so I/O and latency is a non-issue.
>> • Simple and easy to deploy / manage.
>> • Plenty of space to grow into.
> ○ Cons:
>> • Not easily scalable. Storage and compute must be deployed together unless I use VSAN which will incur additional licensing costs. This will require that I strategically monitor and carve up incoming replication storage and resources among nodes. This sounds like a management nightmare.
>> • No native disk tiering in ESXi that I'm aware of, so we essentially have to use all SSD to get the performance / capacity we need.
• We have also looked at a similar "all in one" option like the above from Lenovo, roughly the same price.
Dell's new ME4 PowerVault series looks very appealing. I like the scalability, and the ability to use as SAN or DAS. Anyone have input on this in particular?
• I've thought about an HCI solution with vSAN, like starting with a 2 node Dell cluster, but not sure that I can get my minimum required compute and starting storage in that budget (probably not).

So, any ideas or thoughts? Thanks!
YouGotServered
Service Provider
Posts: 171
Liked: 51 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: Compute and Storage Ideas for DR Environment

Post by YouGotServered »

Hey all, just wanted to post what we eventually got here - maybe I can save someone else some time.

We ended up getting 1 Dell PowerVault ME4024 stuffed with x23 3.84TB SSDs in a RAID 5, plus x1 hot spare. Usually large RAID 5 groups are a bad idea due to URE chances, but one of the good things about modern SSDs is that URE risks are significantly lower. Plus, it's DR - we'll accept the risk. We also opted to go the "DAS" route instead of FC or iSCSI for connectivity. This limits us to 4 concurrent host connections (about double what we currently need, so plenty of room for growth), but offers a performance benefit over the other protocols.

This gives us about 73TB of usable SSD storage with a whopping estimated 46000 IOPS.

When we run out of space, we can scale up with additional units - we are planning to add in some of the ME412 expansion units and stuffing them with x12 12TB or 16TB drives (whichever is most cost effective) and tiering them as a "capacity" tier with the SSDs and letting the PowerVault ME4024 do some of it's auto tiering performance magic. Each ME412 expansion unit with x12 drives in a RAID 10 would give us 65TB or 87TB of storage depending on the drive size and add about 1250 IOPS of performance to the array. Current pricing for 12TB drives is about $22,000, 16TB is almost double currently (yikes).

We decided to re-use our existing PowerEdge R430 host and add in an additional R630 that we currently have unused, so we didn't have to spend much more on compute.

We got this from our vendor xByte (WONDERFUL if you're in the US) for about $33,000, including Dell ProDeploy so that we can have a certified Dell ME4 engineer double check our config prior to deployment to make sure we're in line with best practices.

If anyone is looking for high capacity right now - I strongly recommend looking into SSD storage. Counter-intuitively, it can often be cheaper. Large spinning disks should really be put in a RAID 10, which halves your capacity. SSD drives can often use RAID 5 which saves you tons of overhead. Doesn't fit every scenario, but definitely worth consideration. Additionally, the energy savings can add up.

Thanks!
Post Reply

Who is online

Users browsing this forum: No registered users and 60 guests