Discussions related to using object storage as a backup target.
Post Reply
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

[Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Case: 05796167
Due to catastrophic local storage failure, all performance extents suffered "unplanned removal" from the SOBR. We added a temporaty extent and relied on the capacity tier while the local storage was rebuilt. Once rebuilt, the three local performance extents were re-added to the SOBR, rescanned, the temp extent was removed, and the SOBR rescanned again. At this point the DB / indexes etc should all have been in sync - that's the purpose of a rescan; "go find what's there and make it work".

We've been plagues by ongoing "Local index is not synchronized with object storage, please rescan the scale-out backup repository" and offloads having high failure rates.
DB changes like

Code: Select all

UPDATE [dbo].[Backup.Model.BackupArchiveIndices] set version = '210' where archive_index_id = '2a1aa577-b9d7-4043-9662-fd4fd933674a'
have had to be made at support's request. If a person could figure this out, shouldn't that logic be in the resync code so a resync deals with it?

The issues persist. Today I noticed this

Code: Select all

Index has been recently resynchronized and can only be accessed in 4 hours 45 minutes  
near the bottom of a (mostly failing) offload job.
Does that mean "I see need a resync, and one was done recently so I'm going to ignore the new results for 4.75 hours"? Why?

Generally - SOBR offloads generate errors in normal operation, they should not. If there's "normal" level of errors - perhaps there should also be a threshold at which this does something MUCH more vocal than log each error to Windows event log and put the results in a semi-hidden job that no-one looks at?
Please remember that for many use cases of SOBR + Object, the object is "the" offsite copy. It's important it happens in a timely manner and real failures are reported loudly.

Without wanting to be unreasonable here - SOBR offload and index sync etc just needs to be more resilient and more reliable. Errors should not be "expected" in normal production operation. I see there's changes to object storage in v12 - but I've not had a chance yet to catch up on if that affects SOBR operation, and what the migration path to "the new world" is especially with immutability configured in S3.

Thanks guys
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev » 5 people like this post

V12 literally cures this headache with a guillotine by removing said "local index" completely :D this index and its synchronization was pretty much the only consistent source of support cases around our object storage integration... while the benefit it was designed to bring appeared to be non-existent with real-world data sets.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

THANKS Gostev!

Any chance there's good migration paths to the new world from v11 style SOBRs, especially where immutability is in use? Please can we have docs on this at / soon after release? We'd REALLY like not to have to work this out for ourselves, or contact support and get told "we're not sure" or "create a new one and delete the old one" (incurring additional costs over months - years) etc.
Thanks
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev » 1 person likes this post

AFAIR it's just a quick automated in-place upgrade of existing backups' metadata...
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Image
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by mcz »

am I the only one who does not see that image?
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev »

It's a possibility because I can see it.
seirui
Lurker
Posts: 2
Liked: never
Joined: Dec 13, 2022 4:45 pm
Full Name: Gary Zupo
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by seirui »

Just saw this thread from the the forum email digest. Quick question: Is the index going to be removed in v12 as an option in all backup jobs, due to it's lack of benefit (performance wise), or is this just for local copies in SOBR repositories?
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev » 1 person likes this post

The index will not be used anywhere any longer. Consider it a new (V2) format of storing our image-level backups on object storage, used everywhere in VBR and even outside VBR (in Veeam Backup for AWS/Azure/GCP). There's no other format going forward.
talltim
Enthusiast
Posts: 45
Liked: 17 times
Joined: Mar 29, 2021 11:52 am
Full Name: Tim David
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by talltim »

oooh, I have a feeling that this will solve a long running issue I have had, where running an SOBR rescan fixes one groups' worth of servers while breaking anothers'. It seems that the rescan rebuilds the index from scratch, rather than just fixing the broken bits.
This can bee seen in the this chart of Errors Vs Success. Originally all the server groups were working, with the occasional individual server failing, (normally a certificate error) then after a rescan (yellow in time/date at top) some groups started failing (with a please rescan the scale-out backup repository error), further rescans have just moved the problems around.
Image
I've had a call with support open about this (Case #05753412 — Issue with SOBR Offload job) , but so far we've only managed to make it worse!
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

TallTim - what's that report from?
Thanks
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by veremin »

I've had a call with support open about this (Case #05753412 — Issue with SOBR Offload job) , but so far we've only managed to make it worse!
It seems that you are about to have a remote session with our engineer today. The engineer will collect the additional details necessary for further escalation. Thanks!
talltim
Enthusiast
Posts: 45
Liked: 17 times
Joined: Mar 29, 2021 11:52 am
Full Name: Tim David
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by talltim » 2 people like this post

AlexHeylin wrote: Jan 23, 2023 10:11 am TallTim - what's that report from?
Thanks
It's manually created from the HTML reports of each job. There may be a better way of doing it*, but I needed something that showed me the pattern over time. It helped a lot to see the how individual servers are retried (usually successfully) after a fail, and how running a rescan was fixing some groups while breaking others.

* let me know if there is!
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Thanks TallTim. Our method of doing something similar is probably more complex and too specific to our systems and use case to be any use to anyone else if I shared it.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Veremin,

Please can you get someone serious to look at Case # 06045964 and the history before I lose my cool about this ongoing / recurring issue and support ignoring P2 tickets for days then fumbling the responses I do get and apparently not bothering to read the previous tickets I've told them exist for this issue.

Case #04800922
Case #04832410
Case #05930792
Case #06033309
Case #06045964

Thanks!

Alex
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Case #06207611
Yet more SOBR offload errors :-(
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by veremin » 1 person likes this post

Sorry for the unpleasant experience, we will review the new ticket internally this week and see what might be the root cause of the given issues.

I will keep the thread updated.

Thanks!
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by veremin »

The current logs seem to missing. Based on the provided subset, it seems that our data mover is locking the process, but we need to get the log bundle to speak more accurately about the reasons.

Thanks!
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin » 1 person likes this post

Log now uploaded thanks. :-)
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by veremin » 1 person likes this post

Passed to the QA team - however, it might take a couple of days to analyze them further due to their current load. Thanks, Alex.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Thanks Veremin :-)
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev »

AlexHeylin wrote: May 17, 2023 4:38 pm Please can you get someone serious to look at Case # 06045964 and the history before I lose my cool about this ongoing / recurring issue

Case #04800922
Case #04832410
Case #05930792
Case #06033309
Case #06045964
Since this list looks too scary indeed and may leave an impression of Veeam offload being completely dysfunctional, I decided to personally review these cases with our support engineers.

Case #04800922. Cause: customer entered incorrect email in the notifications settings.
Case #04832410. Cause: API problem on the Wasabi side. Fix: patched on the Wasabi side.
Case #05930792. Cause: object storage occasionally taking too long to answer API calls. Fix: improved error/timeout handling on the VBR side in P20230412.
Case #06033309. Cause: support case opened with invalid Support ID/account and was not processed at all.
Case #06207611. Cause: backup file locked by some process on VBR server (like antivirus). Not a known issue, still under investigation but chances are it's an environment-specific issue as no other similar cases.
Case #06045964. Cause: some existing configuration database problems migrating to V12 that is not prepared for them. Some invalid entries after multiple in-place upgrades etc. Known issue observed at a few customers, but same reason.

Bottom line: the list does not look as bad as it seems :) at least fresh new V12 installs should not be running into any of these problems.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Hi Gostev,

Thanks for taking the time to review all these. I agree it looks scary, and I'm not saying it is completely dysfunctional. However for us it's not been completely functional either. While I agree there have been varied causes, we've not had fully reliable offload that doesn't generate errors which cause tickets and toil in at least two years. I think you'll understand that we have not been able to either safely ignore the errors or resolve them. This has been a significant ongoing pain point for our backup engineers and support desk. To say nothing of the time wasted every week. Time we're under pressure from management to minimise.

I see a number of improvements were made in the most recent release, including to offloading. I expect some of those arose from the cases I created - so while it might look like I'm just having a moan, I hope you found them useful in resolving the issues.

Just to pick up that Case #04800922 covered many issues / improvements. I don't see anything about an email address for notifications. I see the primary issue as a required port missing from / unclear in the documentation and unhelpful ("incorrect") messages in the logs. QA also found misbehaviour in the product and have since fixed it.

Hopefully the latest version will now work better and we can all move on to doing more useful things.

Thanks

Alex
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin » 1 person likes this post

I spoke too soon. Different parts of VBR are still fighting for file locks during offload. Case #06207611

FYI Case #06207611 - after a lot of hunting we traced this to MS Defender. We were not aware that when installing 3rd party AV on Windows Server the AV is prohibited from disabling or uninstalling Defender. On Windows workstation it disables it. Only one of our AV vendors mentions this in their docs, and almost in passing - rather than as a "you should do this" instruction. This led to numerous issues of file locking on multiple VBR repos both in our SP infrastructure and tenant-side. To resolve this it's necessary to completely uninstall Defender. It would be helpful if Veeam support had mentioned this in any of the several cases we opened for this, or if it was in the docs as a "gotcha!" to watch out for. Just following the docs won't work if there's another AV running which you think is inactive.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Just to add Case 06045964 - some of this was due to changes made on tenant side which effectively broke the chains on the SP side in a permanent way which meant the SP side had lots of offload errors and the only solution was to get VBR SP to ignore the chains and reupload the all backup points from the tenant side. It looks like some features were implemented without thinking about or testing what would happen if the target repo was a SP with SOBR offloading to object. That issue has taken three months to resolve. I don't even want to look and see how many hours work this caused.
Unfortunately telling us that issues we're facing don't affect new users doesn't help us. It's good for them that they're not facing these issues, but it's crippling us.
Thanks
Alex
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Latest case for offload headaches is #06262278.
Any sign of a durable fix to completely stabilise and normalise SOBR capacity tiers which were upgraded to v12 instead of fresh in v12?
We're getting rather fed up of this game of whack-a-mole and fixes / workarounds that seem to "resolve" one issue only for another similar one to appear later.
Thanks
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev » 1 person likes this post

Apologies Alex for this experience. I'm sure you realize that upgrades involving architecture/format changes are always painful, especially when the upgraded data spans multiple previous versions, and has unexpected deviations accumulated from those versions which cannot be all reproduced in the upgrade testing. This is not some unique issue, and is the reason why most people always prefer fresh Windows installs vs. in-place upgrades, for example.

I'm afraid there's no option here other than diligently working through all those issues caused by unexpected deviations in the upgraded backups.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Thanks Gostev. As it's often weeks before we get any real progress on them, is there any way to expedite investigation and resolution of each issue?
Each issue is seen by our management and business owner every day in daily reports and the time wasted is seen in our daily / weekly / monthly time summaries. It's damaging Veeam's reputation within our business and we're already under pressure from management to change to a system that's doesn't take so much time to manage. At least if we could get speedy resolution that would mitigate the damage and give us (the tech team) something to push against management with. If these issues just need a really senior engineer to get on a screenshare with us and work through it - maybe that's what needs to happen. Current case has now been open seven business days with no real progress other than the expected "upgrade and reproduce it then send us logs" step.
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by Gostev »

AlexHeylin wrote: Sep 06, 2023 3:40 pmAs it's often weeks before we get any real progress on them, is there any way to expedite investigation and resolution of each issue?
Sure. If you feel the issue is with the support engineer assigned to the case, you can use the Talk to a Manager functionality in the Customer Portal.
AlexHeylin
Veeam Legend
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: [Enhancement request] Make SOBR rescan of S3 work better, and offload more reliable generally

Post by AlexHeylin »

Thanks Gostev
Post Reply

Who is online

Users browsing this forum: No registered users and 15 guests