Scale Out Repository out of space issue

sdeath · Post by **sdeath** » Apr 07, 2016 3:17 pm this post

Hi,

I am a bit confused about what I am reading and what I am being told by support.

- I have a Scale Out repository with 2 extents.
- Extent 1 capacity is 14.4TB and extent 2 is 7TB.
- I have created one Backup Copy job, containing 111 VMs and pointed it to the Scale Out repository.
- The Scale Out repository is configured as Data Locality and Per-VM backup files.

Initially the job appeared distribute the backup files evenly across the 2 extents but the 7TB extent has now run out of space and the Backup Copy job is failing.

Veeam support is telling me that is as designed, using the Data Locality policy will write the increment jobs to the same extent as the full backup file until it runs out of space and will then fail the job. I asked what would happen if I added a new extent and was told that only new VMs to the job would be written to that extent. I asked what is the benefit of Scale Out then and I received a very long pause! I believed that Veeam would handle the space issue, am I wrong?

If this is the case then Scale Out is actually wasting disk space as the other extent still had over 3TB free. Apparently there is no way to back out of a Scale Out repository so I will have to delete the data, create 2 Backup Copy jobs and point them to separate repositories. Not too happy about that.

Can anyone advise if this is correct?

Thanks.

DaveWatkins · Post by **DaveWatkins** » Apr 07, 2016 9:58 pm this post

I've seen this manifested in a number of situations myself, so much so I don't use Scale Out anymore at all. Initially it sounded great, but it's poor detection of placement and then failing jobs if an extent gets full I've given up.

The best (and by best I mean worst) situation I saw was a scale out with a 2TB and a 7TB and an exchange backup of a server that was over 3TB and it failed because it put the backup on the 2TB extent and filled it before completing the job. Scale out seems to be missing a LOT of logic to determine the best place to put jobs, and what do do if an extent fills up.

Based on all the marketing, it was presented as a simple way to aggregate all your spare space to make a large, usable backup repo, but it's missing so much logic to catch simple and expected scenarios I've found it completely unusable so far, which is a shame because I love the idea of it, it just needs to make smarter decisions about job placement and what happens when the policy can't be fulfilled. It also doesn't seem to automatically move existing per VM chains off an extent to free up space on it so other VM chains on that extent can continue to backup, which is again, something i expected it to do.

Post by **tsightler** » Apr 08, 2016 3:23 pm this post

Can you please share your support case?

sdeath · Post by **sdeath** » Apr 08, 2016 3:27 pm this post

Case # 01754266

But support closed case as no way forward!

Thanks

Post by **foggy** » Apr 08, 2016 3:32 pm this post

sdeath wrote:Veeam support is telling me that is as designed, using the Data Locality policy will write the increment jobs to the same extent as the full backup file until it runs out of space and will then fail the job.

This is not actually correct, data locality policy is not that strict and should place the backup on another extent if it has available space. There's an issue, however, where if there's not enough space to even update metadata file, the job will fail and chances are you're experiencing right this one. I recommend re-opening the case and escalating it to a higher tier for a closer investigation.

sdeath · Post by **sdeath** » Apr 08, 2016 3:35 pm this post

Thanks foggy, I have to say that makes more sense. I will re-open and escalate.

Post by **tsightler** » Apr 08, 2016 4:43 pm this post

Exactly why I was asking for the support case. Can you please update this post with the new case # once it is opened? I know there are several issues with data placement that were fixed in U1, but I would not expect a case of "job fails", although of course there can always be cases where en extent is completely exhausted of space, but we should break policy to do backups to other extents in most of those cases. I just want to make sure we fully understand all of the issues and failure cases.

Post by **foggy** » Apr 08, 2016 5:14 pm this post

DaveWatkins wrote:It also doesn't seem to automatically move existing per VM chains off an extent to free up space on it so other VM chains on that extent can continue to backup, which is again, something i expected it to do.

Dave, could you please elaborate on this scenario?

DaveWatkins · Apr 11, 2016 3:03 am

I've you have data locality set as the policy, and an extent is filled, my expectation would be that an entire backup chain would be moved to another extent to free up space on the full one so further incremental data can be placed on that extent for the remaining chains on that extent and so still conform to policy. That didn't seem to happen when I saw it

sdeath · Post by **sdeath** » Apr 11, 2016 9:23 am this post

The case number is the same, seems it hadn't closed completely. It is now awaiting an escalation response.

By the way, I agree with Dave. My expectation was that an existing VM chain would be moved off in order to free up space.

Just to confirm, we are on Update 1.

Marten_med_e · Post by **Marten_med_e** » Apr 11, 2016 10:41 am this post

Our copy backup jobs also stalled when space was exhausted on one disk in the scale-out repository, over 4TB free on the other three disks in the scale-out repository, is running v. 9.0.0.1491.

Code: Select all

2016-04-07 09:28:58 :: Error: There is not enough space on the disk.
Failed to write data into file [E:\Backups\definition.erm].
--tr:Error code: 0x00000070
--tr:Failed to call DoRpc. CmdName: [FcWriteFileEx] inParam: [<InputArguments><FilePath value="E:\Backups\definition.erm" /><DesiredAccess value="1073741824" /><ShareMode value="0" /><CreationDisposition value="3" /><FlagsAndAttrs value="0" /><Offset value="0" /><BytesToWrite value="5400" /></InputArguments>].
There is not enough space on the disk.
Failed to write data into

I guess that the disk got so full that the copy backup job couldn't update necessary files to continue with/move the backup to one of the disks with free space on.

Case #01759224, just opened due to excessive time-out issues when uploading log files.

Cheers

Post by **foggy** » Apr 11, 2016 11:22 am this post

Mårten, this looks like exactly the case I was talking about. Thanks for contacting support and sharing the case ID.

Post by **foggy** » Apr 11, 2016 11:33 am this post

Dave, thanks for clarifying the use case, got it now.

Marten_med_e · Post by **Marten_med_e** » Apr 12, 2016 2:23 pm this post

foggy wrote:Mårten, this looks like exactly the case I was talking about. Thanks for contacting support and sharing the case ID.

If this is the case, I would suggest that B&R creates a "empty" *.vbm/def file that could be used for writing in case the disk gets full,if I don't remember wrong I think MS does/did this with Exchange transaction log file, haven't been messing around with Exchange for a wile so my memory can be way off. Sorry if it is.

Cheers,

Post by **foggy** » Apr 12, 2016 2:25 pm this post

We are looking for possible ways of addressing this behavior.

Marten_med_e · Post by **Marten_med_e** » Apr 12, 2016 3:00 pm this post

I did free up some space on the disk, but B&R just kept writing to the vib/vbk files and got out of space. I had to free up enough space to let the jobs finish and then add a new repository for setting it to incremental backups only to get the jobs to start working again.

Cheers,

nunciate · Post by **nunciate** » Apr 14, 2016 2:10 am this post

I have noticed this same issue. I have 5 extents in a scale out repository. One of the extends is somewhat small (about 3Tb). The others are large (18Tb). The smaller one filled up with zero free space and no jobs which utilized that drive would run. I removed some backups from disk and reran the active fulls. Active fulls were written to a new extent but jobs kept running incremental backups to the small drive and kept filling it up to zero free space. The only way I was able to fix it was to put the extent into maintenance mode and evacuate it completely. Fortunately I had the space to do this on the other extents otherwise i would have had a real problem.

There has to be some logic to determine drive is X% full and to stop backing up to that drive. Is there anything like that because if there is it isn't working.

One other big problem I have noticed is that the jobs don't always pick the best extent that matches the size of the VM. I have a large 10Tb VM I am trying to backup to a this scale out repository for the first time. I have 12Tb free on 1 extent, 7Tb free on another and less than 2 on the others. I run the job and it tries to put the backup on the extent with 7Tb free. I kill the job, put that extent into maintenance mode and reran. Now it tries to backup to an extent with less than 2Tb free all the while ignoring the one extent it can actually be successful at using with 12Tb free. Ridiculous. I literally had to put all extents into maintenance mode to force it to the last one and then re-enabled them all.

My scale out is set to performance mode. I do not have per-vm backups enabled on any of the extents as of yet but plan on doing that soon.

I have to agree with what was already said. The scale-out repository is a great idea but I am not sure it is ready for prime time. Not sure how these issues could not have been identified in beta and RTM.

Post by **foggy** » Apr 14, 2016 1:58 pm this post

nunciate wrote:There has to be some logic to determine drive is X% full and to stop backing up to that drive. Is there anything like that because if there is it isn't working.

There's an heuristic mechanism that estimates the backup size and doesn't allow to use the extent if there's not enough free space.

nunciate wrote:One other big problem I have noticed is that the jobs don't always pick the best extent that matches the size of the VM. I have a large 10Tb VM I am trying to backup to a this scale out repository for the first time. I have 12Tb free on 1 extent, 7Tb free on another and less than 2 on the others. I run the job and it tries to put the backup on the extent with 7Tb free. I kill the job, put that extent into maintenance mode and reran. Now it tries to backup to an extent with less than 2Tb free all the while ignoring the one extent it can actually be successful at using with 12Tb free. Ridiculous. I literally had to put all extents into maintenance mode to force it to the last one and then re-enabled them all.

Alan, do you have a case opened for this one? Any chance the 12TB one is specified for increments only (Performance policy) or does not have free repository slots?

sdeath · Post by **sdeath** » Apr 14, 2016 1:59 pm this post

I have a response from tier 2 support.

"As of now the current SOBR design still requires extents to have some space for VBM files.
VBM is written to every extent where we have backup files related to the job.

However, when backup job has backup files only on one extent, and that extent doesn't have free space at all (not even couple of MB for VBM file) then job will fail and won't switch to another extent.
The workaround is to remove a couple of MB so VBM could be written.

When job has backups files on 2 extents, ext1 and ext2, you won't experience any issue: VBM file will be written to both ext1 and ext2, and if ext1 ran out of space completely, then we will fail to write VBM to ext1 but we will be able to write VBM to ext2 and in this scenario job won't fail - it will detect that it could write at least one copy of VBM.

We have already discussed this limitation with RND, and now RND is thinking about ways to improve our workflow/logic here, probably in one of the upcoming patches for v9, but as of now - unfortunately, this is current design, limitation."

Now, we have one Backup Copy job writing to two extents so just confirming if it should be working as suggested in the third paragraph....

nunciate · Post by **nunciate** » Apr 14, 2016 2:18 pm this post

My feature request would be to add the ability to set a limitation on the extents. Have an option in there that says don't write to this extent if it is X % full. That way people can set their own value and when the volume gets to 10% free space left for example it will automatically stop backing up to that extent. That might be much harder to implement so even if you set a static value that the job checks for would be something good.

Post by **foggy** » Apr 14, 2016 3:54 pm this post

sdeath wrote:When job has backups files on 2 extents, ext1 and ext2, you won't experience any issue: VBM file will be written to both ext1 and ext2, and if ext1 ran out of space completely, then we will fail to write VBM to ext1 but we will be able to write VBM to ext2 and in this scenario job won't fail - it will detect that it could write at least one copy of VBM.

Now, we have one Backup Copy job writing to two extents so just confirming if it should be working as suggested in the third paragraph....

That's correct, if VBM is stored on both extents, the job shouldn't fail in case there's not enough space to update it on of them.

sdeath · Post by **sdeath** » Apr 15, 2016 8:43 am this post

Our backup copy job has created a VBM on both extents so we shouldn't be seeing this issue then?

Post by **foggy** » Apr 15, 2016 10:26 am this post

In case Veeam B&R is able to update at least one copy of metadata, the job shouldn't fail.

sdeath · Post by **sdeath** » Apr 15, 2016 10:42 am this post

Support have confirmed a bug and are compiling a cumulative hotfix.

Post by **A.J.** » Dec 08, 2016 10:26 am this post

We have a similar problem with our Scale Out Repositories when using Windows Dedup on that Repositories - especially on Backup Copy Jobs. (SOBR with per VM backup files and data locally policy)
Our latest investigations showed that the copy jobs will fill up a repository extent until 0 bytes free. The next incremental file will be placed on another extent but the merge process will fail because the primary extent of the job has 0 bytes free.
The only explanation that we currently found is that the dedup algorithm causes a "pumping" on the extent. That means that merging processes will expand the deduped files in such a way that veeam can´t recognize it or take it into consideration while placing backup files to the extents. Maybe veeam thinks that there must be enough space on the extent but there isnt because of the dedup behaviour. Maybe this happens while the jobs on the extents are running and the space isnt checked just in time during that job processing. Who knows...
As mention in the first postings there should be a possibility for the veeam Admin to control a spare space on each extent to prevent this out of space situation. And also there should be an automated process that evacuate the full backup file to an extent with enough space to keep the jobs running an the backup chain intact.
For us at the moment the only solution seems to be to disable deduplication on all repositories and clean up the whole deduped files over the time.
We are also planning to upgrade our repositories to windows 2016 with ReFS (dedup is not supported) to reduce the time for merge processes. We hope that this will solve our problems in the near future, knowing that we will need more extents on our SORBs.

Post by **Gostev** » Dec 09, 2016 1:26 am this post

A.J. wrote:Our latest investigations showed that the copy jobs will fill up a repository extent until 0 bytes free. The next incremental file will be placed on another extent but the merge process will fail because the primary extent of the job has 0 bytes free.

Hi, I recommend you upgrade to 9.5 since it has additional new logic to prevent this from happening. Thanks!

Post by **A.J.** » Dec 12, 2016 7:39 am this post

Hi, we are already on 9.5 because of this problem. No recognizable change in logic but that could be because of the dedup effect.
I read in another forum entry, that veeam in v9.5 reserves 1% of storage space to prevent that the the repository runs out of space. We will see if the problem occurs on our repository that had never deduped enabled.

Jeff M · Post by **Jeff M** » Aug 18, 2017 6:35 am this post

Veeam Support - Case # 02282755 I can confirm whatever was done to resolve this issue in Ver. 9.5 did not work. I still have all of the previous posted issues with SOBR..

Post by **frankive** » Mar 13, 2018 7:42 am this post

Same issue here. One extent fill, and backup copy jobs from customers failing. 9.5u3
# 02673368.

Post by **foggy** » Mar 14, 2018 4:47 pm this post

Switch to per-vm backup chains helped in Jeff's case, as far as I an see from the case notes.

R&D Forums

Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Re: Scale Out Repository out of space issue

Who is online