Host-based backup of VMware vSphere VMs.
Post Reply
dimaslan
Service Provider
Posts: 114
Liked: 9 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Backup storage consolidation challenges

Post by dimaslan »

We are an MSP and we've been trying to consolidate backup storage to one large Synology NAS. We have about 240 TB on it with 24 drives.
So far our backups for our 6 BDRs (One is physical, 5 are VMs) we have on our cloud for our hosted customers were distributed among local storage, storage on the hosts for the VM BDRs, some NAS, MSA SAN and a the new Synology we are looking to use as a single storage.
Normally, all of our backups were finishing up until 3-4 am. Once we had assigned space on the new NAS and moved most of the backups there, we started having severe issues with delays, backups starting but not proceeding, merging taking several times longer to complete, etc.
Not the whole storage on the nas is a assigned to a single LUN, we have created several LUNs either 6TB, 15TB, 25, 40, etc per our needs. Also, some are connected by iSCSI to the esxis which have the VM BDRs and some are connected via iSCSI to the OS of the BDRs.
We investigated and concluded that because multiple servers are going to the same storage but there is nothing to regulate the requests on the storage, all jobs are trying to write at the same time and that is the issue. We have started consolidating the BDRs and will try to limit the ones writing to the NAS to only one or two, however there are currently about 20-25 jobs sending there.
Has anyone had the same situation? We are trying to figure out the best way to scale this and future proof it if possible. As an MSP, we may add 10 - 15 new customers and some may leave as well. Also, because we will be doing LUN replication on that NAS to another same NAS on a remote location, we absolutely need to keep the LUN sizes to a lower than 20 TB size where possible.
We have considered doing flash storage but we were told this is not actually necessary. Currently the bottleneck is the storage IOPS, however it's due to the number of servers and not the amount of data or anything else.
Mildur
Product Manager
Posts: 9849
Liked: 2606 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Backup storage consolidation challenges

Post by Mildur » 1 person likes this post

Hi Dimitris

We don't recommend to use NAS storage as a backup repository.
Such NAS devices are normally not the best choice for repositories and mixing the same device with many VBR-Servers can lead to the performance issues you are facing.
And LUN replication will have other issues you could face, like corrupted blocks that also will be replicated.

If you want to have a future proof design, go with standalone VBR server with local disks (block storage) and object storage to have a second immutable copy of the data.

Thanks
Fabian
Product Management Analyst @ Veeam Software
micoolpaul
Veeam Software
Posts: 219
Liked: 111 times
Joined: Jun 29, 2015 9:21 am
Full Name: Michael Paul
Contact:

Re: Backup storage consolidation challenges

Post by micoolpaul » 2 people like this post

Hi Dimitris,

Here to second what Fabian has said.

You’re likely having the following multiple issues combining to reduce your throughput:

- Simultaneous IO streams: You’ve got to leverage multiple IO streams to leverage CIFS/NFS more effectively, you’ll want high bandwidth throughput, ideally at least 10Gbps, between the gateway & NAS. If you’re using SAS disks, they’re likey 6/12Gbps, if your NAS was 1Gbps networking for example, you’ve got an immediate reduction in throughput.
- Spindles: If you’ve consolidated that many servers, you’ve likely also reduced the numbers of disks, so, especially depending on RAID type, you’ve likely starved your read & write IO.
- Task concurrency: Every server had its own CPU & RAM, so when you’ve consolidated, to get similar task concurrency, you need to have similar CPU & RAM totals to everything you’ve replaced (assuming nothing was dramatically oversized before and was right sized)
- Functionality Loss: Without ReFS or XFS as your primary storage file system, you can’t use fast clone therefore any processes that would’ve been accelerated by this, aren’t anymore.

You’d be better off using your Synology as an iSCSI device to get the XFS/ReFS based file system to get some benefits back + you could use MPIO to reduce any network bottlenecks where possible, but it’s a Bandaid approach to fix this.
-------------
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
dimaslan
Service Provider
Posts: 114
Liked: 9 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Re: Backup storage consolidation challenges

Post by dimaslan »

Thank you both for your replies.
To clarify, the NAS only has LUNs, not shared folders, and all the LUNs are ReFS 64k. We have not yet consolidated all the jobs to one server, I believe 4 BDRs are currently sending to that NAS via iSCSI and we were going to consolidate to two.
Only thing I'm not sure about is MPIO, I will check. Currently, network is Gbit but we have checked and also opened a case with Synology and network is not the bottleneck, if it becomes apparent it is, we will add 10Gbit networks.

Thanks.
micoolpaul
Veeam Software
Posts: 219
Liked: 111 times
Joined: Jun 29, 2015 9:21 am
Full Name: Michael Paul
Contact:

Re: Backup storage consolidation challenges

Post by micoolpaul »

What metrics are you seeing as the bottleneck? Are you seeing any resources on the Synology being overutilised such as CPU/RAM? Is it simply a case of too much IO?
-------------
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
jamcool
Enthusiast
Posts: 67
Liked: 11 times
Joined: Feb 02, 2018 7:56 pm
Full Name: Jason Mount
Contact:

Re: Backup storage consolidation challenges

Post by jamcool »

I did not see how much of the 240 TB you have in use or how big your full backups are, but I would be concern about in a total disaster being able to restore servers in a timely manner with only a 1 Gbps connection. I would see what option you have to upgrade the network and iSCSI to 10 Gbps unless plan to replace your storage to local disk (or SAN disk). I personally prefer local (SAN) storage with SAS drives running at 7,200 RPM. I find I max out my 10 Gbps connection before maxing out the storage I/O. Then having a second copy in a different location on different storage is also a best practice.
dimaslan
Service Provider
Posts: 114
Liked: 9 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Re: Backup storage consolidation challenges

Post by dimaslan »

I just saw these replies, sorry for not following up.
We have about 300 TB of backups now, added a second Synology NAS. Added to the 300 TB production backups are about 50 TB of copy jobs from customers on-premise Veeam BDRs going to our cloud.
CPU and RAM do not seem to be a problem. We think it's IOPS and latency.
Total disks for the two NAS are 12 disks on one + a drawer of another 12 and the other has just the 12 drives. So, 24 drives on one and 12 on the other. Adding a second NAS greatly helped but I am still seeing issues if I try to restore while the backup storage is worked by copy jobs and SOBR offloading.
I am wondering if just putting an HPE Proliant as windows server and hooking up both NAS on it directly, then set that up as a Backup repo server, would that help.
Also thought if creating separate arrays for each server, right now it's just two pools of drives and everything is using them.
The thing about 10GB networking did come to mind, but my boss checked it and did not see the network being the issue. I can upgrade it to 10GB at any point of course, if necessary, just does not appear to be the issue.
Klarmann
Novice
Posts: 9
Liked: 2 times
Joined: Nov 12, 2020 1:58 pm
Full Name: Robert Klarmann
Contact:

Re: Backup storage consolidation challenges

Post by Klarmann » 1 person likes this post

We had a HPE MSA with about 60TB for Backup Storage first ... but we got trouble with the performance, because we had archive class disks ...

We switched to an HPE Apollo 4510 GEN10 with 60x10TB Drives (2xRAID60 with Hotspares) gives us around 440 TB Storage. It runs under Suse Enterprise Linux with XFS Filesystem with relink. It writes with a single Backup with about 1.6-1.8GBit/sec ... We will go for an additional Apollo next year with 60x16TB for longterm backup ...

I don't know what the prices are for your Synology systems ... but HPE does some goof project prices!
Disclaimer: This posting is provided "AS IS" with no warranties or guarantees, and confers no rights.
"Every once in a while, declare peace. It confuses the hell out of your enemies"
rennerstefan
Veeam Software
Posts: 688
Liked: 150 times
Joined: Jan 22, 2015 2:39 pm
Full Name: Stefan Renner
Location: Germany
Contact:

Re: Backup storage consolidation challenges

Post by rennerstefan »

The majority of issues I saw in the field that are related to what is mentioned here are because of the disk layout of the system.
It can be a combination of disk type, size, cache size and type as well as backend connectivity and OS behavior for concurrent IOs that impact the performance.
With that there will never be a general statement on why solution XYZ is not performing as expected.

Based on my experience I also think it is because of the disks as 24 NL-SAS disks will only give you a certain amount of performance.

What does the Veeam Interface say regarding the bottleneck?
Did you test the performance of the Synology with Tools like "fio" yet to see what it is capable to deliver compared to your older system?

It would be good to get some details but as mentioned above it depends on lots of things.

Thanks
Stefan Renner

Veeam PMA
dimaslan
Service Provider
Posts: 114
Liked: 9 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Re: Backup storage consolidation challenges

Post by dimaslan »

@rennerstefan I have not visited this thread in a while and just saw your reply. We have somewhat avoided the big delays by using Weekly Synthetic Full so merging does not stress the storage any more.
Those two enterprise NAS along with dozens of drives were just bought by my company so I don't think I can move to another solution that easily. We could however consider going for a physical server as backup repository.
I have further consolidated BDR servers on our cloud to just 3, another for replication jobs and one more for the VBO which would soon go away hopefully.
My next long term consideration is not only performance but also expandability, security and longevity.
I was thinking of moving all backups to just one or two large drives instead of smaller ones as moving TBs around is causing us severe issues with delays. We do have ReFS so it's not an issue having 100 TB or larger drives. Currently however all 3 BDRs are on a single esxi and the two NAS, one with 12 drives + 12 drives on expansion unit and the other with another 7 drives are connected via dual 1 Gbit ethernet.
We can add 10 Gbit if needed but it was so far unclear that the network was the bottleneck.
My concern then is when selecting a backup storage solution as an MSP:
- Something that can be expanded (add drives for example) as needed
- Something that is secure (Immutable storage)
- Something that does not eliminate my backups in case of h/w issues (we have a lot of SOBRs going to Wasabi with monthly and yearly retention and it would be greatly troublesome to have to connect everything again).
Post Reply

Who is online

Users browsing this forum: Amazon [Bot], musicwallaby, sally123 and 61 guests