Comprehensive data protection for all workloads
Post Reply
DonZoomik
Expert
Posts: 216
Liked: 53 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Experiences with virtualized GlusterFS Distributed mode backup and restore

Post by DonZoomik »

Background: a large constantly growing data set of files, organized as filesystem of semi-random (hash-based) paths that we don't control. Currently on stored hardware but planned to be virtualized. It would make a pretty inconvenient VM size (20TB+) so were looking for alternatives. GlusterFS in Distributed mode looks pretty good as you can present it as file system with native client and add nodes with reasonable size as required. Loss of resiliency is not really a problem as virtualized storage is on a SAN.

However there is little information about backup and recovery of such a system. What if we need to restore a node, how does the system react to a rollback? If starting replicas in DR event, they will have metadata mismatches (backups and node starts will be quite randomly distributed) - will it self-heal...?

Anybody have experiences with such a system?

HannesK
Veeam Software
Posts: 5882
Liked: 809 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Post by HannesK »

Hello,
I found some earlier conversations, but with no final feedback from customers.

1) consistency seems to be possible with snapshots https://docs.gluster.org/en/latest/Admi ... Snapshots/
2) as you mention, that you can present it as file system, I would go with NAS backup. So you can easily restore the data.


Best regards,
Hannes

DonZoomik
Expert
Posts: 216
Liked: 53 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Experiences with virtualized GlusterFS Distributed mode backup and restore

Post by DonZoomik »

How would snapshots help? The only scenario I can think of... keep all snapshots between Veeam backups and when restoring, revert Gluster snapshot to earliest common snap. For example, if nodes were backed up at 00:10, 00:50 and 01:20, revert to 00:00 or similar). Or just create snaps with integration scrips on each node and revert to one with earliest snap.

NAS backup would seem to have unneeded cost (if VMs are already virtualized) and very long RTOs (needing to push a lot/everything back to GFS instead of Instant Recovery or ready to start replicas).

Post Reply

Who is online

Users browsing this forum: No registered users and 27 guests