Discussions related to using object storage as a backup target.
Post Reply
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

Currently we're monitoring for SOBR upload errors logged to Windows Event log - this is doesn't work well because errors are "expected" so it's hard to tell "expected" errors from "I'm broken and you need to fix me" errors. For example, we had 22 offload failures on one of our SOBR in last 24 hours... that's more than we usually see - but it doesn't tell us if human intervention is required.

What's "the Veeam way" for a monitoring system to confirm SOBR offload has completed, or is still in progress, or has real errors which need manual intervention to resolve?
We've got VSPC if that helps.

Thanks
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by Gostev » 3 people like this post

The daily SOBR status email report provides a good summary.

@Egor Yakovlev please also check if some of those Windows Event log events should really be warnings, or not logged at all. Temporary connection issues might be better not mentioned at all, unless of course they are already logged only after lots of fighting and retries? It is just that 22 failures would indicate we're too spammy, unless there were actual major backup infrastructure or Internet access or object storage issues during those 24 hours.

Please include @veremin in review to understand what can be optimized.
Egor Yakovlev
Product Manager
Posts: 2597
Liked: 715 times
Joined: Jun 14, 2013 9:30 am
Full Name: Egor Yakovlev
Location: Prague, Czech Republic
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by Egor Yakovlev »

Sounds good, queued for investigation.
/Cheers!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by veremin »

Sure, we will have a call with Egor this week to review the current situation with offloading errors, warnings and reporting. Thanks!
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

Wow - what a response guys - thanks!! :D

While this thread isn't specifically about this case, you might find helpful background in Case 05930792 & Case #04800922. We don't normally open cases for this, but it's suboptimal to live with is as we have been.
Gostev wrote: Mar 29, 2023 1:27 pm The daily SOBR status email report provides a good summary.
That looks like a good place to start for us to use as "OK / go look at it" indication - thanks.

If you want me to make a case with some logs inc Windows event logs, let me know.

Thanks

Alex
sykerzner
Service Provider
Posts: 47
Liked: 2 times
Joined: Jul 27, 2020 1:16 pm
Full Name: SYK
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by sykerzner » 1 person likes this post

Hi

This may seem like overkill, but here is something we cobbled together based on similar conversations and suggestions on the forums.
"%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe" -noprofile -command "import-module Veeam.Backup.PowerShell; $sobrOffload = [Veeam.Backup.Model.EDbJobType]::ArchiveBackup; $sessions =[Veeam.Backup.Core.CBackupSession]::GetByTypeAndTimeInterval($sobrOffload,'9/1/2022', (Get-Date).adddays(1)) ; $taskgroups = $sessions.gettasksessions() | where {($_.progress.TransferedSize -gt '0') -and($_.status -eq 'Success')} |group-object -property Name; $lastSuccessTasks = foreach ($Task in $Taskgroups) {$task.group | sort -property {$_.progress.stoptimelocal} | select -last 1 -Property JobName, Name, Status, @{l='EndTime';e={$_.progress.StopTimeLocal}}, @{l='Duration'; e={$_.progress.duration}}, @{l='TransferedSize (GB)'; e={$_.progress.TransferedSize/1GB}} }; 'Task Count:' ; ($lastsuccessTasks | measure-object).count ; $lastsuccessTasks| sort jobname | convertto-csv"
This gives you information for each offload "task" (one for each backup "Job"). Name, ID, Last time the job succeeded actually sent data. The task name will sometime change to "name of the SOBR Offload" depending on if this task was independent or not, but the ID stays the same (I don't know of a great way to deal with that).

Forgive the ugly look. The simplest way to run in via our RMM daily was as a one line CMD.
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

Thanks very much sykerzner! I'll give that a go - certainly a great place to start :D
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by veremin »

Hey, Alex, we discussed the issue further, and in order to change the behavior or suggest something further we'd like to get the exact failure that got logged 22 times. This should help us to re-verify the logic behind the particular event. Thanks!
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

Hi,
I've uploaded both the Veeam logs and Veeam Backup windows event log (which is what we've been looking at) to Case #05930792
Thanks!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by veremin »

Thanks for the reference, we will review the provided information and post back. Thanks!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by veremin » 1 person likes this post

We've contacted your support engineer recently.

Next week we review the event logs and see whether the given error is logged with the necessary priority (error instead of a warning) and the necessary number of times. This will help us to understand if there is room for improvement.

I will update the topic once I have more information.

Thanks!
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

Just to share today's alerts from our monitoring based on the eventlogs
13 SOBR offload failures on OUR-SP-SOBR1 in last 24 hours. Offsite backups may be incomplete! Most recent 2023-04-18 07:43:27
2 SOBR offload failures on TENANT1-SOBR1 in last 24 hours. Offsite backups may be incomplete! Most recent 2023-04-17 22:39:21
2 SOBR offload failures on TENANT2-SOBR1 in last 24 hours. Offsite backups may be incomplete! Most recent 2023-04-17 09:01:52
2 SOBR offload failures on TENANT3-SOBR1 in last 24 hours. Offsite backups may be incomplete! Most recent 2023-04-18 04:59:46

We're looking to move over to the VSPC alerts, though integrating those into our systems / process is rather challenging.
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

It looks like there are various contributors to these message counts:

Object storage cleanup failed: Timed out waiting for the backup files to be released, cancelling the job
This warning seems to be due to an internal Veeam design / scaling issue.
Resource not ready: object storage repository S3-EXT-NAME for SOBR-NAME Timed out waiting for backup infrastructure resources to become available (14400 sec)
This seems to be due to VBR trying to run too many offload jobs at the same time. In this case there appear to have been three instances of "SOBR Offload" plus an "SOBR-NAME Offload" for each SOBR running simultaneously (five total). Due to the required (but undocumented) concurrent task limit on the S3 repo (to avoid rate limit errors from S3 vendor) this pushes the bottleneck back to the object storage repository S3-EXT-NAME being "unavailable". At least, that's my interpretation.

18/04/2023 23:59:01 :: Removing checkpoint d4d19b02-323f-41cd-81fe-1bd7b354b1a2 from Capacity Tier...
19/04/2023 01:20:25 :: Checkpoint cleanup failed Details: HTTP exception: WinHttpQueryDataAvaliable: 12002: The operation timed out, error code: 12002
REST API error: 'S3 error: We encountered an internal error. Please retry the operation again later. Code: InternalError', error code: 500 Other: Detail: 'Could not find pool number 2269 in extent B-643390/O-f5db5c2dbe717ca6/S-1',

18/04/2023 23:40:30 :: Checkpoint cleanup failed Details: HTTP exception: WinHttpQueryDataAvaliable: 12002: The operation timed out, error code: 12002
18/04/2023 23:40:32 :: Object storage cleanup failed: HTTP exception: WinHttpQueryDataAvaliable: 12002: The operation timed out, error code: 12002
Shared memory connection was closed.
18/04/2023 23:40:32 :: Object storage cleanup failed: HTTP exception: WinHttpQueryDataAvaliable: 12002: The operation timed out, error code: 12002
Exception from server: HTTP exception: WinHttpQueryDataAvaliable: 12002: The operation timed out, error code: 12002
18/04/2023 23:40:47 :: Offload finished with warning at 18/04/2023 23:40:47

And other related transient errors from the S3. To a point VBR should just accept these as normal and retry and only report as errors if they fail repeatedly.

18/04/2023 07:43:27 :: Failed to offload backup. Error: Failed to call RPC function 'FcRenameFile': The process cannot access the file because it is being used by another process. Failed to rename file from [D:\Veeam\Backups\xxxxxxxxxxxxxxxxxxxxxxx\xxxxxxxxxxxxxxxxxxxxxx\xxxxxxxx.vbm.temp] to [D:\Veeam\Backups\xxxxxxxxxxxxxxxxxxxxxxx\xxxxxxxxxxxxxxxxxxxxxx\xxxxxxxx.vbm].
File 'D:\Veeam\Backups\xxxxxxxxxxxxxxxxxxxxxxx\xxxxxxxxxxxxxxxxxxxxxx\xxxxxxxx.vbm.temp' locked by 0 processes:.
File 'D:\Veeam\Backups\xxxxxxxxxxxxxxxxxxxxxxx\xxxxxxxxxxxxxxxxxxxxxx\xxxxxxxx.vbm' locked by 0 processes:.
18/04/2023 07:43:27 :: Failed to upload meta into master agent.
We're plagued by this occasional error and can't find the cause. All AV exclusions are in place as the most aggressive exclusions possible, and applied to both the file path and file names. Windows defender is uninstalled.
We find the "locked by 0 processes" very suspicious. Does that mean it's locked by zero Veeam processes, or zero processes in total (in which case the whole message could be wrong, as it means the file is NOT locked open as it says)
pirx
Veteran
Posts: 613
Liked: 92 times
Joined: Dec 20, 2015 6:24 pm
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by pirx » 1 person likes this post

This is v11, a few common errors/warnings

Error (not sure if this has to be an error as the blackout period was set by purpose)
20.04.2023 06:19:18 :: Processing xxxxx Error: Job was stopped due to backup window setting
Error (should this really be an error?)
08.04.2023 23:14:55 :: Processing xxxx Error: Stopped by job 'xxxx' (Backup)
Warning
19.04.2023 22:27:46 :: Object storage cleanup failed: Failed to retrieve certificate from https://s3.dualstack.ap-southeast-1.amazonaws.com
Error (very common over all our different locations with buckets in different regions, not sure why the above is warning and this an error)
19.04.2023 17:00:44 :: Processing xxxxx Error: Failed to retrieve certificate from https://s3.dualstack.ap-southeast-1.amazonaws.com
Error (random but very common, I guess it has to be an error, but as this happens only randomly we usually ignore it)
09.04.2023 05:00:35 :: Processing xxxxx Error: HTTP exception: WinHttpSendRequest: 12030: The connection with the server was terminated abnormally
, error code: 12030


Warning (happens a lot, we tweaked some settings in the past but without 100% solution, so we just ignore it)
08.04.2023 19:45:31 :: Object storage cleanup failed: REST API error: 'S3 error: Please reduce your request rate.
Code: SlowDown', error code: 503
Other: HostId: 'xxxxxx
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

I agree that "Job was stopped due to backup window setting" should not be an error. It's an indication that the system is working as designed / configured.
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin » 1 person likes this post

Still causing drama several days a week...

Code: Select all

Processing 0c31728d-5c3c-46fa-925f-9edbe89621b7 Error: Timed out waiting for backup infrastructure resources to become available (14400 sec)  
These seem to be routine - and due to other offload jobs running. Bear in mind we've been told by support to limit concurrent jobs on the S3 repo to 2 to deal with another message - very likely

Code: Select all

REST API error: 'S3 error: Please reduce your request rate.
This whole design of "run LOADS of offload jobs, often at the same time, have them ignore that other jobs are already running, then log errors when they timeout" just seems "highly suboptimal".

Code: Select all

Error: Backup file version mismatch: scale-out backup repository rescan is required.  
Oh!!! :roll: Given this was in sync previously, and nothing other than VBR has touched either the performance or capacity tiers - this is "very disappointing" that this seems to keep happening, long after the upgrade to v12 was supposed to improve all this.

If a rescan really is required - why doesn't VBR queue one up and suspend all the offload jobs (which will likely fail anyway) until it's completed?
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin »

New Case #06045964
AlexHeylin
Veteran
Posts: 563
Liked: 173 times
Joined: Nov 15, 2019 4:09 pm
Full Name: Alex Heylin
Contact:

Re: What's the Veeam Way: Confirm SOBR offload either in progress or completed

Post by AlexHeylin » 1 person likes this post

The rescan has spat out a load of warnings like

Code: Select all

Failed to import backup Backup Copy xxxxxxxxx\yyyyyyy - zzzzzz Details: The existing index has a different backup id
These are presumably because the SP side is sulking about a tenant having built a new backup server and remapped the new backups to the old chain, having upgraded from "per-machine data single metadata" to "per-machine data per-machine metadata".

SPs need a system that works and is more reliable and less needy than this!

Thanks

Alex
Post Reply

Who is online

Users browsing this forum: Amazon [Bot] and 4 guests