Comprehensive data protection for all workloads
Post Reply
vipthomps
Service Provider
Posts: 60
Liked: 6 times
Joined: Dec 06, 2010 7:51 pm
Full Name: Eric Thompson
Location: Boston, MA
Contact:

Linux Repo - XFS fills up then goes offline

Post by vipthomps »

Has anyone else ran into this? We have insider protection enabled for our customers. On one of our hardened repos we had a SoBR extent completely fill up. When that happened there was no space for the disk to complete writing and the file system went offline and we couldn't run any commands on the volume.

Here is how we fixed it
  • commented out the line in fstab
    disabled the veeamtransport service
    rebooted
    tried to run xfs_repair and it gave me the message that there was valuable metadata changes in the log which needed to be replayed so I could mount and unmount the file system or I could use the -L option
    uncommented out the fstab file
    mounted and unmounted the file system
    ran the xfs_repair
    mounted the file system and checked it out
    enabled the veeamtransport service
    rebooted
    log in and confirmed that the file system was still fine

I opened Case 05464625 and the ending recommendation was to manage the space and not let it fill up and put in a feature request to implement a threshold to stop writing to an extent.
HannesK
Product Manager
Posts: 15598
Liked: 3445 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by HannesK »

Hello,
I moved the request to the general forums, because the free space discussion is a general one and the XFS solution might help others. Thanks for posting it!

The "monitor free space" and "what to do if disk is full" is a classic conversation. I agree with support, that free space monitoring is the job of the person who runs a computer. That includes service providers, administrators, but also the end user with his smartphone.

Reactions on this can be automated (e.g. stopping services, doing shut-down etc.) or manual.
feature request to implement a threshold to stop writing to an extent.
since "always" that one is solved for every software: just place a file on every disk with X amount of space. Delete this file in emergency cases. For me, that's "system administration", but agree, it would simplify life for people who don't have countermeasures in place. I will talk to my colleagues how this could be done.

In general, it's a complicated decision what to do. Imagine the setting is "keep 1GB free space". A backup is running and only needs "1MB" more to finish. Now the software stops this job and restore is needed from the broken restore point. The person who is responsible for the 1GB free space will now be asked "why do you leave up so much space"?

Best regards,
Hannes
vipthomps
Service Provider
Posts: 60
Liked: 6 times
Joined: Dec 06, 2010 7:51 pm
Full Name: Eric Thompson
Location: Boston, MA
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by vipthomps »

Hannes,

I agree with you on most points. A tunable registry setting would allow the backup admin to pick to use the feature or not. In my experience a Windows repo will survive being fully written to - we run into this as a service provider with Scale-Out repos and needing to keep block cloning intact and hence use data-locality. Managing free space manually where there is other space on surviving extents seems counter-productive to the main point of SoBRs. The general awareness I'm bringing up here is that an XFS volume will fill up completely to the point of being knocked offline and ALL restore points on it becoming unavailable versus the in-flight jobs that then fail as well when the file system of the extent becomes un-addressable.

The large empty file that could be deleted is a good idea and we'll drop one on each extent going forward in case we run into this again.
tsightler
VP, Product Management
Posts: 6040
Liked: 2867 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by tsightler » 5 people like this post

This is one feature of ext3/4 that I miss a little bit with XFS. The ext based filesystems always reserved a percentage of space that could only be accessed by the root user, and that allocation could be set during format or after (usually just 1% was enough for large filesystems), while XFS only has reserved blocks, which provides enough space to recover without corruption, but not much space to help you move anything and can sometimes take the filesystem offline and require manual steps as above.

However, Linux and XFS do provide powerful methods to help prevent total space exhaustion via quota management. You can set disk quota by user/group, but my favorite feature of XFS is the ability to set quota for directory hierarchies via projects. Using this you can easily define a quota for your repository directory that is less than the total available free space, and you can change it at runtime if, for example, you need to temporarily allow more space, etc. It's more flexible, IMO, than dropping a large empty file which wouldn't keep from completely exhausting space on the volume and makes granular pressure release a little more difficult, although you could create mulitple large files to acheive this.

For example, you could set a quota at 85% of total disk space and then monitor against that, but, if you get near the limit and need a little more space to give time to move things around, temporarily increase to 90 or 95%, on the fly, then back to 85% once things are moved, all while protecting from total space exhaustion.

Here's a link to one resource on XFS quotas, but there are plenty others out there.

https://www.thegeekdiary.com/how-to-ena ... le-system/
vipthomps
Service Provider
Posts: 60
Liked: 6 times
Joined: Dec 06, 2010 7:51 pm
Full Name: Eric Thompson
Location: Boston, MA
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by vipthomps »

That is great info! Thank you! We'll add this to our standard setup doc.
vipthomps
Service Provider
Posts: 60
Liked: 6 times
Joined: Dec 06, 2010 7:51 pm
Full Name: Eric Thompson
Location: Boston, MA
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by vipthomps » 1 person likes this post

We went down this path but quotas don't seem to account for reflink block cloning so we'll need to keep looking for a solution
tsightler
VP, Product Management
Posts: 6040
Liked: 2867 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by tsightler »

Out of curiosity, what specifically do you mean by "don't seem to account for reflink"? Do you mean that quotas apply to logical used space vs actual used space, so, for example, if you have 4 fulls with block clone the quota accounts for the entire space of all 4 full backups? I didn't notice this in my own testing, but I must admit that my testing was mostly with forever forward and, in that case, reflink is mostly about performance, not space savings.
vipthomps
Service Provider
Posts: 60
Liked: 6 times
Joined: Dec 06, 2010 7:51 pm
Full Name: Eric Thompson
Location: Boston, MA
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by vipthomps »

Yes exactly. Its the logical written files that count towards the quota so it would not account for the available storage on the mount point based on our test
tsightler
VP, Product Management
Posts: 6040
Liked: 2867 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Linux Repo - XFS fills up then goes offline

Post by tsightler » 1 person likes this post

I think it's pretty normal for quota to apply to the logical size for any solution that shares blocks, I'm pretty sure quotas work the same way on DataDomain and dedupe enabled Netapp systems. Doing anything else leads to the possibility of exceeding quota without actually writing additional data as simply changing data in existing files can increase the backend used space so you have to account for each write to a file vs just new allocations, which is significant additional overhead.

I do understand it could be quite difficult to manage this issue since it will be hard to determine a reasonable ratio for the quota, especially at the start, but I can see a potential solution with a small script to do some dynamic quota management, at that point however, I'm not sure if it's really easier than creating some empty files as Hannes suggested.
Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 46 guests