Hello,
We're relatively early into our first production use of Veeam backup and Replication, about 4 months. We're a small shop with only 1 main production Hyper-V server and some other scattered physical servers.
We have a "veeam in a box" (Windows Server 2019, ReFS) configured with an SOBR consisting of a modest amount of local SSD storage (so we can recover on this box if needed) + offsite object storage with immutability. Using forward incremental + GFS.
We then use copy jobs to a local Linux Immutable repo with a large RAID 60 array of spinners. Longer term GFS backups go here due to space availability.
All of the above has been working great, with very fast performance and expected behavior.
We have a relatively new process in house that generates large amounts of live data. Veeam is handling it nicely as the data is compressible at about a 2:1 ratio and based on the actual space used on the "Veeam in a box" local SSD array, lots of de-duplication too. So the Veeam side of this data handling is going well.
The problem is that the live data is resting on relatively fast/expensive SSD array, the only array on the VM server, and it's eating up the available space quickly.
The generators and users of this data have agreed to a retention period, which will eventually help with size and also that the older live data can be compressed.
Compressing the older live data would cut the space usage on the live array, but I am uncertain how it might reduce Veeam's ability to effective utilize block cloning and whatever other tech it leverages to optimize repository space usage. Obviously Veeam's ability to compress this already compressed data would be impacted, but it's really the de-duplication that I am concerned about.
In other words, I don't want to "solve the problem" of the large live data set (by compressing the older data) and end up just pushing the problem of large space usage to my "Veeam in a box" SSD repository.
The current .vhdx storing this data is sized at 7.5TB with 6.5TB used. Growing at about 0.5TB per month.
So, what I'm asking is:
1) If we compressed the older live data "in place" (on the same .vhdx, compressed one folder at a time) would Veeam still retain the ability to do de-duplication across incremental and full backups?
2) Would it help Veeam manage the compressed data better if we moved the compressed live data to a different .vhdx? I suspect it would not matter but figured it's worth asking.
3) The alternative solution is adding an array of spinners to this VM server. Fortunately the growing data is mostly in the form of large files, so spinners would be adequately performant. This just complicates the setup so I'd like to avoid it if possible.
Any other suggested solutions?
Thanks all.
-
- Influencer
- Posts: 11
- Liked: 5 times
- Joined: Nov 03, 2020 1:29 pm
- Full Name: Ryan
- Contact:
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Veeam handling of .zip files with respect to repository storage space efficiency
Hi Ryan
Can you give an example what's your Live Data is? Our products doesn't have "live Data". So I try to understand what you mean with that.
1. For me, live data would be your production environment, the data which is accessible to everyone. This is the production Data.
2. Then there is the backup Job, which targets your ReFS SOBR . This is your backed up Data.
3. The SOBR uses an object Storage to offload the backups. This is your copy of the backed up data.
4. And the backup copy copies the backups from the ReFS SOBR to the Linux Hardened Repository. This is a second copy of your backed up data.
I would have used the Linux Hardened Repository as a SOBR extend, and not the SSD Disks.
You will have only two backup copies. But that depends on your business requirements if two backup copies are good enough. So you don't need the SSD for the backups.
Can you give an example what's your Live Data is? Our products doesn't have "live Data". So I try to understand what you mean with that.
1. For me, live data would be your production environment, the data which is accessible to everyone. This is the production Data.
2. Then there is the backup Job, which targets your ReFS SOBR . This is your backed up Data.
3. The SOBR uses an object Storage to offload the backups. This is your copy of the backed up data.
4. And the backup copy copies the backups from the ReFS SOBR to the Linux Hardened Repository. This is a second copy of your backed up data.
Do you use virtual Disks for the backup repository?The current .vhdx storing this data is sized at 7.5TB with 6.5TB used. Growing at about 0.5TB per month.
I would have used the Linux Hardened Repository as a SOBR extend, and not the SSD Disks.
You will have only two backup copies. But that depends on your business requirements if two backup copies are good enough. So you don't need the SSD for the backups.
Product Management Analyst @ Veeam Software
Who is online
Users browsing this forum: No registered users and 7 guests