SOBR issues

ianbutton1 · Post by **ianbutton1** » Jun 01, 2021 12:46 pm this post

Hi, we're a bit new to SOBR so maybe somebody with more experience can tell whether our arrangements are wrong, as we have been having some issues getting going.
Just 1 SOBR - performance extent on HPE D2600 array (19x 4TB SATA 7.2k drives giving 63TB available). Capacity tier in AWS S3-compatible cloud storage.
Test setup (tiny backup & offload) - OK. Initial production setup - 5TB SQL backups (Win Agent job on VBR server) and 1TB Solaris Agent job) worked OK & offloaded.
Added one of the two big backup jobs - 12TB users' Office docs etc (Win Agent job on HP cluster attached to 3PAR) - backed up OK but offload stopped with "Failed to establish connection" (TLS 1.2 error) on Friday evening 21st May, cleared itself Monday afternoon 24th May (though we didn't change anything). So we lost a whole weekend of unthrottled offloading.
For the rest of last week, backups seemed OK & minor offloads OK, though backups began to eat up space on the array.
Jobs are set to retain 7 days and run Synthetic Fulls on Saturdays. Synthetic Full of the big 12TB job had nearly finished by Friday evening, but . . . last Friday evening 28th May the TLS error happened again, and killed all the jobs - though the interruption cleared (itself) after a few hours. Since then, offload of the big 12TB job has been very variable - down to 2MB/sec for 36hrs of the weekend, and its synthetic full is taking a long time (64% after 2.5 days). Resmon on landing-zone server (also VBR server) shows 100MB/sec reading from last week's vbk and 100MB/s writing to this week's vbk; and at other times writing up to 400MB/s to ArchiveIndex. Array space is holding up (bouncing around 15TB free) but we have another 10TB on the HP/3PAR cluster waiting in the wings and obviously there isn't space for it yet.
My main question is - are such performance issues to be expected when getting a SOBR started? After the first offload, I hope traffic should decrease. I have now separated the synthetic fulls across the week so they don't all run on Saturdays - should that help? Alternatively, should we run Active Fulls every week (i.e. only disk writes), instead of Synthetic Fulls (reads & writes)?
Any suggestions gratefully received,
Thanks
Ian

Post by **HannesK** » Jun 01, 2021 2:15 pm this post

Hello,
for the TLS issue... do you have a case number for that? Because I have seen the same in case #04812851.

If your synthetic full takes that long, then I assume that you a file system without block cloning. REFS / XFS solves that issue https://www.veeam.com/blog/advanced-ref ... suite.html

Best regards,
Hannes

ianbutton1 · Post by **ianbutton1** » Jun 02, 2021 1:23 pm this post

Hello Hannes,
The TLS issue was case 04824153. I abandoned the case as the problem cleared itself on 24/5 (we hadn't changed anything, and didn't know why it happened, though it happened again just a week later!). Our VBR/gateway server has TLS 1.2, and the cloud object storage provider also supports TLS 1.2. Our VBR server is up-to-date with Veeam and with Windows updates.

Regarding the filesystem (REFS/XFS) I'll have to look at that - thanks for the link.
The synthetic full of the big backup finished overnight, and offload throughput increased after that (from about 2MB/s to about 70MB/s). I have switched off all synthetic fulls, in order to let the offload complete. At 25MB/sec when throttled (07:00-22:00 Mon-Fri) and 100MB/s at other times the offload will take 3 or 4 days. After the initial "seeding" of the object storage in the cloud, should transfers become much smaller (and therefore quicker)? Or does it try to transfer a full vbk file each time?
Thanks again
Ian

Post by **sfirmes** » Jun 02, 2021 1:31 pm this post

@ianbutton1 The SOBR offload uses a forever forward incremental methodology transfer data to the capacity tier. So once your initial seeding is finished, only unique blocks will be transferred going forward.

This link provides some great information regarding how we transfer data to the capacity tier: https://helpcenter.veeam.com/docs/backu ... ml?ver=110

ianbutton1 · Post by **ianbutton1** » Jun 03, 2021 10:34 am this post

Thank you @sfirmes !

ianbutton1 · Post by **ianbutton1** » Jun 04, 2021 1:10 pm this post

The synthetic full eventually finished, disk-thrashing stopped, and offload picked up again - we're now 5 days in, the whole job is 38% done, this big backup showing 68% done (surely one figure must be wrong, as all other backups are already offloaded. I have stopped all synthetic fulls until offload is complete. Will it be safe to enable them again afterwards?

ReFS looks good, though I read that it introduces fragmentation of backup files, possibly slowing down restores. Is that still a problem? Can it be resolved with a Compact/defrag job or another way?

ReFS migration means trashing our 64TB NTFS landing-zone with the SOBR performance extents, and an OS upgrade (2016 will do for now). We aren't too far down the line, the OS upgrade is due anyway, and this might be useful DR training. The backups residing on repos there are all copied to other locations (cloud object storage & StoreOnce archives), so losing local copies sounds OK. Keeping the database in sync with reality sound more complicated though; and we would prefer to minimise deleting/recreating repos & jobs if possible.

So, after stopping jobs, should our procedure be . . . ?? :-
1. Disconnect but leave in place (simply Remove From Configuration?) Capacity Tier backups in cloud object storage, and backup copies on StoreOnce. Same process for both?
2. Do we need to remove/delete repos on the 64TB array (nowhere big enough to evacuate them to). Remove performance extents? Need to delete SOBR? Need to delete jobs?
3. Reformat 64TB as ReFS (cluster-size 64K) and in-place upgrade of server OS. Is VBR likely to be happy with an IPU?
4. Create new repos on the array - same names OK? Recreate jobs if deleted earlier?
5. Create new SOBR - same name OK?
6. Rebuild backup chains by reconnecting backups in cloud object storage and backup copies onStoreOnce - is that just a simple Import? Or is it advisable to start new chains - Active Fulls for all the jobs?
7. Recreate Agent for Solaris job (new repo will get a new Id, so old job will no longer connect)

Is there a handy "cook-book" for this kind of task?
Thanks
Ian

Post by **HannesK** » Jun 07, 2021 2:21 pm this post

Hello,
restores were improved in V10 and V11. The alternative to REFS / XFS is throwing more IO performance on the repository. Whatever you prefer, but from a cost / value perspective most customers choose REFS / XFS.

I would go for Server 2019 because Server 2022 is "around the corner" already. I don't trust Windows major version upgrades before 2019. Your choice

Upgrading operating system and changing file system are two separate tasks. I suggest to treat them separately.

My idea to do a migration was using "seal extent" and let the software do the rest https://helpcenter.veeam.com/docs/backu ... ml?ver=110

@veremin: do we have a supported way of migrating the performance tier of SOBR on the same machine without re-uploading everything again?

Best regards,
Hannes

gummett · Post by **gummett** » Jun 08, 2021 2:27 pm this post

Hi @ianbutton1,
hopefully @veremin can confirm but the process I've tested to replace the (single) performance tier of a SOBR is as follows:

0. Disable jobs.
1. Put the old performance tier repository in maintenance mode (stops VBR trying to access it)
2. Reformat volume (or prepare replacement volume).
3. Add new repository on that volume (at least the folder name needs to be different as the path cannot match the previous performance tier).
3. Edit SOBR and replace performance tier extent with new repository (without evacuating backups). Veeam will automatically re-sync and download VBK and VIB metadata stubs to performance tier.
4. Ideally run an active full on each job (object offload will be incremental). Alternatively ‘copy to performance tier’ the current backup chain for each VM. A regular (incremental) backup will fail until one of these done.
5. Remove legacy repository and re-enable jobs.

ianbutton1 · Post by **ianbutton1** » Jun 08, 2021 2:47 pm this post

@Hannesk @gummett Thank you both very much for your information & suggestions.
I have opened a case (04849271) and we'll see if Support can add anything. An extra issue that has arisen is that the SF vbk file (18TB) is 50% bigger than the source data (12TB), and while Veeam is building one vbk from the previous one, that takes up to 36TB of the storage!
Our new Windows Server licence is being ordered, so we'll be getting RefS setup asap.
Thanks again
Ian

gummett · Post by **gummett** » Jun 09, 2021 7:46 am this post

You're welcome @ianbutton1
While you're reformatting, you might also want to note the guidance here with regard to stripe size and caching:
https://bp.veeam.com/vbr/VBP/3_Build_st ... block.html

R&D Forums

SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Re: SOBR issues

Who is online