Experiences with virtualized GlusterFS Distributed mode backup and restore

DonZoomik · Post by **DonZoomik** » Jul 30, 2020 1:12 pm this post

Background: a large constantly growing data set of files, organized as filesystem of semi-random (hash-based) paths that we don't control. Currently on stored hardware but planned to be virtualized. It would make a pretty inconvenient VM size (20TB+) so were looking for alternatives. GlusterFS in Distributed mode looks pretty good as you can present it as file system with native client and add nodes with reasonable size as required. Loss of resiliency is not really a problem as virtualized storage is on a SAN.

However there is little information about backup and recovery of such a system. What if we need to restore a node, how does the system react to a rollback? If starting replicas in DR event, they will have metadata mismatches (backups and node starts will be quite randomly distributed) - will it self-heal...?

Anybody have experiences with such a system?

Post by **HannesK** » Jul 31, 2020 5:20 am this post

Hello,
I found some earlier conversations, but with no final feedback from customers.

1) consistency seems to be possible with snapshots https://docs.gluster.org/en/latest/Admi ... Snapshots/
2) as you mention, that you can present it as file system, I would go with NAS backup. So you can easily restore the data.

Best regards,
Hannes

DonZoomik · Post by **DonZoomik** » Aug 03, 2020 11:00 am this post

How would snapshots help? The only scenario I can think of... keep all snapshots between Veeam backups and when restoring, revert Gluster snapshot to earliest common snap. For example, if nodes were backed up at 00:10, 00:50 and 01:20, revert to 00:00 or similar). Or just create snaps with integration scrips on each node and revert to one with earliest snap.

NAS backup would seem to have unneeded cost (if VMs are already virtualized) and very long RTOs (needing to push a lot/everything back to GFS instead of Instant Recovery or ready to start replicas).

Apr 30, 2021 2:07 pm

Hi @DonZoomik ,
I believe @HannesK was thinking of using a snapshot for consistency and to offload backup prom production filesystem to read data from that snapshot, which is possible with Veeam NAS backup.

Bo.

DonZoomik · Post by **DonZoomik** » Apr 30, 2021 4:43 pm this post

Exhuming old threads I see!

Anyway this project went live with just VM-based backups. We played around a lot and distributed GlusterFS is tolerant to rollbacks. Files missing in filesystem (due to rollback/restore) just disappear from namespace. It's true that NAS backup with Gluster snaps would have been more consistent but the result was deemed good enough as-is with no extra costs.

R&D Forums

Experiences with virtualized GlusterFS Distributed mode backup and restore

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Who is online