I am currently in the process of planning a low budget setup for our backup-needs at our company. To give you a better understanding: We are a small IT-company developing VR/3D-Applications and our current team consists of 15 people. No - not our IT-Admin-Team - the whole company.
We share some of our office and IT-infrastructure with another small IT company, so all in all there are about 20 ppl. We do our IT works mostly by ourselves, sometimes get external help on specific issues. I am sorry for the long post - but I really would love some insight to help me make a good decision.
Our budgets are rather tight, yet we try to do things properly, yet reasonable.
Currently, our setup is as follows:
- Main Hyper-V host running our VMs - most importantly the primary DC and our fileserver - this is a self-built server with dual Socket Xeon, 128GB RAM, Adaptec Smartraid RAID controller with BBU and 8 Intel SSDs in Raid 10 configuration.
- 1 Secondary Hyper-V host running the secondary DC and acting as a target for Hyper-V replica for the Fileserver, so we can continue working with minimal downtime in case the primary host goes down. This system is considerably smaller and runs on a Microsemi board with Intel Onboard-Mirror-Raid only, as it is replaceable in case of failure.
- Backups are done via Altaro to a Synology DS916+ as primary target (SMB) and Azure Storage as offsite-target.
- The whole rack is backed by an APC USP, that gives all machines about 20 minutes for graceful shutdown. The secondary host is in another room, so it is not backed by a USP yet.
- The hosts are connected with 10GbE, the backup target currently only with 1GbE.
- being a 3D/VR dev-studio, data accumulates and although we are running with Windows Server deduplication on the fileserver-VM, we are currently at around 11TB of important storage. This makes backups via 1GbE painfully slow
- our internet connection sucks - it's super expensive to get fiber in our city, so we are stuck with 50 MBit/s Upload-speed for the whole place, as we just can't afford the 10x price-increase that it would be to transition to fiber with 100+ MBit/s currently. Thus a full first-time backup to a cloud offsite-location takes about 10 days, which is very problematic in case of having to create a new one when switching things up/things break
- Altaro is not capable of doing offsite-backup-transfers, that survive WAN-instability. Although we are running with redundant WANs, it can happen, that there are short connection hickups. Altaro also is starting to become a bit intransparent in some error reportings - and the health check is rather slow too.
Our plan:
- setup a Windows 10 VM on the secondary host for Veeam Backup-Server running the free community edition. For another Windows Server VM, we would need to spend extra on another full Win Server license. As Win 10 is supported according to the requirements, I'd rather save the money on this.
- get a smaller USP for the secondary host for graceful shutdown option.
- switch primary backup target to a vDSM on Synology RS1221+ with NFS share. The Synology is already in use by the other company and we plan on creating a VM on it for our Main-Repo, as it is connected with 10 GbE, has 8 Toshiba Enterprise 18TB HDDs, that are coupled with 2 1TB Samsung SSDs as cache.
- switch to the Synology DS916+ with 5GbE-Adapter as offsite-backup - the offsite backup would be initially seeded in the office over 5GbE - significantly reducing the current backup-time and then it will be my home, connected via VPN
- move all Hyper-V hosts and backup infrastructure to separate VLAN, completely separated from our production VLAN to protect hosts and shares from possible encryption malware that is most likely coming from one of the workstations.
I spent the last days setting up a Veeam test-environment, checking backup performance etc. - I also read through documentation, the forums and reddit. It made me very aware of the heated discussion about whether a Synology NAS is a valid backup target (corruption) - the consensus seems: Run on enterprise hardware.
I just wonder, if that is really the only possible answer, as budget for us is really tight. We try to invest as much, as we can. It's not like "management wants to save without reason" - we just have as much money, as we do - there's no real way around. Our backup needs are also different from an enterprise-IT-corporation. A downtime of 2-3 days in case of complete disaster (fire in the office etc.) is fine, as long as we are able to recover at all. It is also fine, if we would lose 1-2 days worth of data/work in case we have to recover from backup, because it costs less to redo - lets say 4 days of work, because a daily increment was faulty - as long as the synthetic full backup of last weekend is still ok.
Even if I would go and buy a refurbished server for the primary repo, it would still leave the question of what to do with the secondary one. Having to have a full Server-Repo at a home is not really a workable solution for many reasons.
If the backups are faulty and the software lets me know about it reliably, it is also ok for me to renew them, as things now are a lot faster thanks to 10GbE connection and an offsite-backup, that does not take weeks to renew from scratch.
The Synologys are running on btrfs, the shares are staying as NFS.
I am uncertain, but from what I read, I plan to do deduplication, compression and encryption via Veeam - so the data is not "manipulated"/worked on by Synology. The only thing I would consider is turning on the self-healing option for the folder on the Synology - but also not sure about it.
So can I roll with that setup? Are there alternative ways, I am not seeing?
I am not planning to go with ReFS iSCSI targets, as they seem even more prone to silent data corruption.
We were running our backups on Synology Targets for years and even had to do a full restore about 7 years ago, while still running on volume-level backups with Acronis. It worked.
Last year we had one time, where the Altaro Backup software reported a health error on our primary backup, but honestly - hard to tell, if that is Synology fault or network-transport error or whatever.