-
- Service Provider
- Posts: 114
- Liked: 9 times
- Joined: Jul 01, 2017 8:02 pm
- Full Name: Dimitris Aslanidis
- Contact:
Backup storage consolidation challenges
We are an MSP and we've been trying to consolidate backup storage to one large Synology NAS. We have about 240 TB on it with 24 drives.
So far our backups for our 6 BDRs (One is physical, 5 are VMs) we have on our cloud for our hosted customers were distributed among local storage, storage on the hosts for the VM BDRs, some NAS, MSA SAN and a the new Synology we are looking to use as a single storage.
Normally, all of our backups were finishing up until 3-4 am. Once we had assigned space on the new NAS and moved most of the backups there, we started having severe issues with delays, backups starting but not proceeding, merging taking several times longer to complete, etc.
Not the whole storage on the nas is a assigned to a single LUN, we have created several LUNs either 6TB, 15TB, 25, 40, etc per our needs. Also, some are connected by iSCSI to the esxis which have the VM BDRs and some are connected via iSCSI to the OS of the BDRs.
We investigated and concluded that because multiple servers are going to the same storage but there is nothing to regulate the requests on the storage, all jobs are trying to write at the same time and that is the issue. We have started consolidating the BDRs and will try to limit the ones writing to the NAS to only one or two, however there are currently about 20-25 jobs sending there.
Has anyone had the same situation? We are trying to figure out the best way to scale this and future proof it if possible. As an MSP, we may add 10 - 15 new customers and some may leave as well. Also, because we will be doing LUN replication on that NAS to another same NAS on a remote location, we absolutely need to keep the LUN sizes to a lower than 20 TB size where possible.
We have considered doing flash storage but we were told this is not actually necessary. Currently the bottleneck is the storage IOPS, however it's due to the number of servers and not the amount of data or anything else.
So far our backups for our 6 BDRs (One is physical, 5 are VMs) we have on our cloud for our hosted customers were distributed among local storage, storage on the hosts for the VM BDRs, some NAS, MSA SAN and a the new Synology we are looking to use as a single storage.
Normally, all of our backups were finishing up until 3-4 am. Once we had assigned space on the new NAS and moved most of the backups there, we started having severe issues with delays, backups starting but not proceeding, merging taking several times longer to complete, etc.
Not the whole storage on the nas is a assigned to a single LUN, we have created several LUNs either 6TB, 15TB, 25, 40, etc per our needs. Also, some are connected by iSCSI to the esxis which have the VM BDRs and some are connected via iSCSI to the OS of the BDRs.
We investigated and concluded that because multiple servers are going to the same storage but there is nothing to regulate the requests on the storage, all jobs are trying to write at the same time and that is the issue. We have started consolidating the BDRs and will try to limit the ones writing to the NAS to only one or two, however there are currently about 20-25 jobs sending there.
Has anyone had the same situation? We are trying to figure out the best way to scale this and future proof it if possible. As an MSP, we may add 10 - 15 new customers and some may leave as well. Also, because we will be doing LUN replication on that NAS to another same NAS on a remote location, we absolutely need to keep the LUN sizes to a lower than 20 TB size where possible.
We have considered doing flash storage but we were told this is not actually necessary. Currently the bottleneck is the storage IOPS, however it's due to the number of servers and not the amount of data or anything else.
-
- Product Manager
- Posts: 9848
- Liked: 2606 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Backup storage consolidation challenges
Hi Dimitris
We don't recommend to use NAS storage as a backup repository.
Such NAS devices are normally not the best choice for repositories and mixing the same device with many VBR-Servers can lead to the performance issues you are facing.
And LUN replication will have other issues you could face, like corrupted blocks that also will be replicated.
If you want to have a future proof design, go with standalone VBR server with local disks (block storage) and object storage to have a second immutable copy of the data.
Thanks
Fabian
We don't recommend to use NAS storage as a backup repository.
Such NAS devices are normally not the best choice for repositories and mixing the same device with many VBR-Servers can lead to the performance issues you are facing.
And LUN replication will have other issues you could face, like corrupted blocks that also will be replicated.
If you want to have a future proof design, go with standalone VBR server with local disks (block storage) and object storage to have a second immutable copy of the data.
Thanks
Fabian
Product Management Analyst @ Veeam Software
-
- Veeam Software
- Posts: 219
- Liked: 111 times
- Joined: Jun 29, 2015 9:21 am
- Full Name: Michael Paul
- Contact:
Re: Backup storage consolidation challenges
Hi Dimitris,
Here to second what Fabian has said.
You’re likely having the following multiple issues combining to reduce your throughput:
- Simultaneous IO streams: You’ve got to leverage multiple IO streams to leverage CIFS/NFS more effectively, you’ll want high bandwidth throughput, ideally at least 10Gbps, between the gateway & NAS. If you’re using SAS disks, they’re likey 6/12Gbps, if your NAS was 1Gbps networking for example, you’ve got an immediate reduction in throughput.
- Spindles: If you’ve consolidated that many servers, you’ve likely also reduced the numbers of disks, so, especially depending on RAID type, you’ve likely starved your read & write IO.
- Task concurrency: Every server had its own CPU & RAM, so when you’ve consolidated, to get similar task concurrency, you need to have similar CPU & RAM totals to everything you’ve replaced (assuming nothing was dramatically oversized before and was right sized)
- Functionality Loss: Without ReFS or XFS as your primary storage file system, you can’t use fast clone therefore any processes that would’ve been accelerated by this, aren’t anymore.
You’d be better off using your Synology as an iSCSI device to get the XFS/ReFS based file system to get some benefits back + you could use MPIO to reduce any network bottlenecks where possible, but it’s a Bandaid approach to fix this.
Here to second what Fabian has said.
You’re likely having the following multiple issues combining to reduce your throughput:
- Simultaneous IO streams: You’ve got to leverage multiple IO streams to leverage CIFS/NFS more effectively, you’ll want high bandwidth throughput, ideally at least 10Gbps, between the gateway & NAS. If you’re using SAS disks, they’re likey 6/12Gbps, if your NAS was 1Gbps networking for example, you’ve got an immediate reduction in throughput.
- Spindles: If you’ve consolidated that many servers, you’ve likely also reduced the numbers of disks, so, especially depending on RAID type, you’ve likely starved your read & write IO.
- Task concurrency: Every server had its own CPU & RAM, so when you’ve consolidated, to get similar task concurrency, you need to have similar CPU & RAM totals to everything you’ve replaced (assuming nothing was dramatically oversized before and was right sized)
- Functionality Loss: Without ReFS or XFS as your primary storage file system, you can’t use fast clone therefore any processes that would’ve been accelerated by this, aren’t anymore.
You’d be better off using your Synology as an iSCSI device to get the XFS/ReFS based file system to get some benefits back + you could use MPIO to reduce any network bottlenecks where possible, but it’s a Bandaid approach to fix this.
-------------
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
-
- Service Provider
- Posts: 114
- Liked: 9 times
- Joined: Jul 01, 2017 8:02 pm
- Full Name: Dimitris Aslanidis
- Contact:
Re: Backup storage consolidation challenges
Thank you both for your replies.
To clarify, the NAS only has LUNs, not shared folders, and all the LUNs are ReFS 64k. We have not yet consolidated all the jobs to one server, I believe 4 BDRs are currently sending to that NAS via iSCSI and we were going to consolidate to two.
Only thing I'm not sure about is MPIO, I will check. Currently, network is Gbit but we have checked and also opened a case with Synology and network is not the bottleneck, if it becomes apparent it is, we will add 10Gbit networks.
Thanks.
To clarify, the NAS only has LUNs, not shared folders, and all the LUNs are ReFS 64k. We have not yet consolidated all the jobs to one server, I believe 4 BDRs are currently sending to that NAS via iSCSI and we were going to consolidate to two.
Only thing I'm not sure about is MPIO, I will check. Currently, network is Gbit but we have checked and also opened a case with Synology and network is not the bottleneck, if it becomes apparent it is, we will add 10Gbit networks.
Thanks.
-
- Veeam Software
- Posts: 219
- Liked: 111 times
- Joined: Jun 29, 2015 9:21 am
- Full Name: Michael Paul
- Contact:
Re: Backup storage consolidation challenges
What metrics are you seeing as the bottleneck? Are you seeing any resources on the Synology being overutilised such as CPU/RAM? Is it simply a case of too much IO?
-------------
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
Michael Paul
Veeam Data Cloud: Microsoft 365 Solution Engineer
-
- Enthusiast
- Posts: 67
- Liked: 11 times
- Joined: Feb 02, 2018 7:56 pm
- Full Name: Jason Mount
- Contact:
Re: Backup storage consolidation challenges
I did not see how much of the 240 TB you have in use or how big your full backups are, but I would be concern about in a total disaster being able to restore servers in a timely manner with only a 1 Gbps connection. I would see what option you have to upgrade the network and iSCSI to 10 Gbps unless plan to replace your storage to local disk (or SAN disk). I personally prefer local (SAN) storage with SAS drives running at 7,200 RPM. I find I max out my 10 Gbps connection before maxing out the storage I/O. Then having a second copy in a different location on different storage is also a best practice.
-
- Service Provider
- Posts: 114
- Liked: 9 times
- Joined: Jul 01, 2017 8:02 pm
- Full Name: Dimitris Aslanidis
- Contact:
Re: Backup storage consolidation challenges
I just saw these replies, sorry for not following up.
We have about 300 TB of backups now, added a second Synology NAS. Added to the 300 TB production backups are about 50 TB of copy jobs from customers on-premise Veeam BDRs going to our cloud.
CPU and RAM do not seem to be a problem. We think it's IOPS and latency.
Total disks for the two NAS are 12 disks on one + a drawer of another 12 and the other has just the 12 drives. So, 24 drives on one and 12 on the other. Adding a second NAS greatly helped but I am still seeing issues if I try to restore while the backup storage is worked by copy jobs and SOBR offloading.
I am wondering if just putting an HPE Proliant as windows server and hooking up both NAS on it directly, then set that up as a Backup repo server, would that help.
Also thought if creating separate arrays for each server, right now it's just two pools of drives and everything is using them.
The thing about 10GB networking did come to mind, but my boss checked it and did not see the network being the issue. I can upgrade it to 10GB at any point of course, if necessary, just does not appear to be the issue.
We have about 300 TB of backups now, added a second Synology NAS. Added to the 300 TB production backups are about 50 TB of copy jobs from customers on-premise Veeam BDRs going to our cloud.
CPU and RAM do not seem to be a problem. We think it's IOPS and latency.
Total disks for the two NAS are 12 disks on one + a drawer of another 12 and the other has just the 12 drives. So, 24 drives on one and 12 on the other. Adding a second NAS greatly helped but I am still seeing issues if I try to restore while the backup storage is worked by copy jobs and SOBR offloading.
I am wondering if just putting an HPE Proliant as windows server and hooking up both NAS on it directly, then set that up as a Backup repo server, would that help.
Also thought if creating separate arrays for each server, right now it's just two pools of drives and everything is using them.
The thing about 10GB networking did come to mind, but my boss checked it and did not see the network being the issue. I can upgrade it to 10GB at any point of course, if necessary, just does not appear to be the issue.
-
- Novice
- Posts: 9
- Liked: 2 times
- Joined: Nov 12, 2020 1:58 pm
- Full Name: Robert Klarmann
- Contact:
Re: Backup storage consolidation challenges
We had a HPE MSA with about 60TB for Backup Storage first ... but we got trouble with the performance, because we had archive class disks ...
We switched to an HPE Apollo 4510 GEN10 with 60x10TB Drives (2xRAID60 with Hotspares) gives us around 440 TB Storage. It runs under Suse Enterprise Linux with XFS Filesystem with relink. It writes with a single Backup with about 1.6-1.8GBit/sec ... We will go for an additional Apollo next year with 60x16TB for longterm backup ...
I don't know what the prices are for your Synology systems ... but HPE does some goof project prices!
We switched to an HPE Apollo 4510 GEN10 with 60x10TB Drives (2xRAID60 with Hotspares) gives us around 440 TB Storage. It runs under Suse Enterprise Linux with XFS Filesystem with relink. It writes with a single Backup with about 1.6-1.8GBit/sec ... We will go for an additional Apollo next year with 60x16TB for longterm backup ...
I don't know what the prices are for your Synology systems ... but HPE does some goof project prices!
Disclaimer: This posting is provided "AS IS" with no warranties or guarantees, and confers no rights.
"Every once in a while, declare peace. It confuses the hell out of your enemies"
"Every once in a while, declare peace. It confuses the hell out of your enemies"
-
- Veeam Software
- Posts: 688
- Liked: 150 times
- Joined: Jan 22, 2015 2:39 pm
- Full Name: Stefan Renner
- Location: Germany
- Contact:
Re: Backup storage consolidation challenges
The majority of issues I saw in the field that are related to what is mentioned here are because of the disk layout of the system.
It can be a combination of disk type, size, cache size and type as well as backend connectivity and OS behavior for concurrent IOs that impact the performance.
With that there will never be a general statement on why solution XYZ is not performing as expected.
Based on my experience I also think it is because of the disks as 24 NL-SAS disks will only give you a certain amount of performance.
What does the Veeam Interface say regarding the bottleneck?
Did you test the performance of the Synology with Tools like "fio" yet to see what it is capable to deliver compared to your older system?
It would be good to get some details but as mentioned above it depends on lots of things.
Thanks
It can be a combination of disk type, size, cache size and type as well as backend connectivity and OS behavior for concurrent IOs that impact the performance.
With that there will never be a general statement on why solution XYZ is not performing as expected.
Based on my experience I also think it is because of the disks as 24 NL-SAS disks will only give you a certain amount of performance.
What does the Veeam Interface say regarding the bottleneck?
Did you test the performance of the Synology with Tools like "fio" yet to see what it is capable to deliver compared to your older system?
It would be good to get some details but as mentioned above it depends on lots of things.
Thanks
Stefan Renner
Veeam PMA
Veeam PMA
-
- Service Provider
- Posts: 114
- Liked: 9 times
- Joined: Jul 01, 2017 8:02 pm
- Full Name: Dimitris Aslanidis
- Contact:
Re: Backup storage consolidation challenges
@rennerstefan I have not visited this thread in a while and just saw your reply. We have somewhat avoided the big delays by using Weekly Synthetic Full so merging does not stress the storage any more.
Those two enterprise NAS along with dozens of drives were just bought by my company so I don't think I can move to another solution that easily. We could however consider going for a physical server as backup repository.
I have further consolidated BDR servers on our cloud to just 3, another for replication jobs and one more for the VBO which would soon go away hopefully.
My next long term consideration is not only performance but also expandability, security and longevity.
I was thinking of moving all backups to just one or two large drives instead of smaller ones as moving TBs around is causing us severe issues with delays. We do have ReFS so it's not an issue having 100 TB or larger drives. Currently however all 3 BDRs are on a single esxi and the two NAS, one with 12 drives + 12 drives on expansion unit and the other with another 7 drives are connected via dual 1 Gbit ethernet.
We can add 10 Gbit if needed but it was so far unclear that the network was the bottleneck.
My concern then is when selecting a backup storage solution as an MSP:
- Something that can be expanded (add drives for example) as needed
- Something that is secure (Immutable storage)
- Something that does not eliminate my backups in case of h/w issues (we have a lot of SOBRs going to Wasabi with monthly and yearly retention and it would be greatly troublesome to have to connect everything again).
Those two enterprise NAS along with dozens of drives were just bought by my company so I don't think I can move to another solution that easily. We could however consider going for a physical server as backup repository.
I have further consolidated BDR servers on our cloud to just 3, another for replication jobs and one more for the VBO which would soon go away hopefully.
My next long term consideration is not only performance but also expandability, security and longevity.
I was thinking of moving all backups to just one or two large drives instead of smaller ones as moving TBs around is causing us severe issues with delays. We do have ReFS so it's not an issue having 100 TB or larger drives. Currently however all 3 BDRs are on a single esxi and the two NAS, one with 12 drives + 12 drives on expansion unit and the other with another 7 drives are connected via dual 1 Gbit ethernet.
We can add 10 Gbit if needed but it was so far unclear that the network was the bottleneck.
My concern then is when selecting a backup storage solution as an MSP:
- Something that can be expanded (add drives for example) as needed
- Something that is secure (Immutable storage)
- Something that does not eliminate my backups in case of h/w issues (we have a lot of SOBRs going to Wasabi with monthly and yearly retention and it would be greatly troublesome to have to connect everything again).
Who is online
Users browsing this forum: Mildur and 64 guests