Discussions related to using object storage as a backup target.
m.novelli
Veeam ProPartner
Posts: 590
Liked: 113 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Checkpoint removal process info

Post by m.novelli »

Yes, working with support I also assume it was a transient network error. The error returned from Veeam VBR should be updated with a better text pointing to a probable network connection error, and maybe the Checkpoint Removal Job logic should be updated to retry connection for longer times

Marco
Ciao,

Marco
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin »

Gostev wrote: Oct 01, 2024 1:35 pm @veremin this does bring the question when did start processing the retention of VeeamZIP/Exported/Abandoned backups in 12.1? Because I assume it had to be started at some specific time as well even in 12.1?
Orphaned restore points are processed by the background retention process. The background retention process starts automatically every 24 hours at 00:30 and runs in the background.

However, the background retention process does not handle VeeamZIP and Exported restore points. For these types of restore points, a special mechanism called "veeamzipretention" is triggered around midnight, which removes them as intended.

Thanks!
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Checkpoint removal process info

Post by Gostev »

So effectively the only change in 12.2 as it comes to processing retention of these 3 special backup types is start time? 03:00 instead of 00:00/00:30
flomp
Enthusiast
Posts: 49
Liked: 3 times
Joined: Oct 24, 2018 6:15 pm
Contact:

Re: Checkpoint removal process info

Post by flomp »

I also get the checkpoint removal error every few days. Case ID 07420920

One thing I noticed: All of these emails have a time stamp between 18:59 and 19:01 (This is without the CheckpointRemovalJobStartTimeHours setting)
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin »

Gostev wrote: Oct 02, 2024 2:46 pm So effectively the only change in 12.2 as it comes to processing retention of these 3 special backup types is start time? 03:00 instead of 00:00/00:30
It seems that we are mixing up terminologies and combining the concepts of background retention and background checkpoint removal processes.

VeeamZIP and exported restore points are removed (and were previously removed in version 12.1) around midnight by a special process called "veeamzipretention". Previously, within this same process, checkpoints were also removed for these restore points. Now, the removal of the checkpoint will occur after the completion of the retention activity (immediately after).

The session initiated at three o'clock in the morning will remove mostly the checkpoints from the backup chains produced by agents that are in the "managed by agent" mode.

Thanks!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin »

flomp wrote: Oct 03, 2024 11:29 am I also get the checkpoint removal error every few days. Case ID 07420920
One thing I noticed: All of these emails have a time stamp between 18:59 and 19:01 (This is without the CheckpointRemovalJobStartTimeHours setting)
Similarly to this, it seems to be another case with SSL certificate retrieval failure indicating network or infrastructure issues. Thanks!
flomp
Enthusiast
Posts: 49
Liked: 3 times
Joined: Oct 24, 2018 6:15 pm
Contact:

Re: Checkpoint removal process info

Post by flomp »

I was asked by Veeam Support to search on the object storage for the checkpoints.

As I only have the GUIDs from the error-email, I suppose these GUIDs are part of the name of the object versions. Is this correct?
m.novelli
Veeam ProPartner
Posts: 590
Liked: 113 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Checkpoint removal process info

Post by m.novelli » 1 person likes this post

I also noticed the "Checkpoint removal error" in this situation: the customer had a Firewall misconfiguration and the offload to Object Storage didn't run for one week. We solved the problem and the offload process started to upload data to Microsoft Azure
The Checkpoint removal process started at 03.00 and it was still running at 03.00 of the subsequent night due tu huge cleaning. I got the error "Checkpoint removal error" but immediately started at 03.00 a new Checkpoint removal process. This one finished successfully
So I got a "false positive" error. This should be handled better by Checkpoint removal process logic

Marco
Ciao,

Marco
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin »

flomp wrote: Oct 04, 2024 5:12 pm I was asked by Veeam Support to search on the object storage for the checkpoints.
As I only have the GUIDs from the error-email, I suppose these GUIDs are part of the name of the object versions. Is this correct?
I would still suggest that the support engineer take a look at the post I'm referring to and carefully review the debug logs.

In the logs, we found the same errors as mentioned in the reference thread, which indicate not a problem with the background checkpoint removal process but rather an issue with the network or infrastructure.

Thanks!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin » 1 person likes this post

m.novelli wrote: Oct 07, 2024 8:26 am I also noticed the "Checkpoint removal error" in this situation: the customer had a Firewall misconfiguration and the offload to Object Storage didn't run for one week. We solved the problem and the offload process started to upload data to Microsoft Azure
The Checkpoint removal process started at 03.00 and it was still running at 03.00 of the subsequent night due tu huge cleaning. I got the error "Checkpoint removal error" but immediately started at 03.00 a new Checkpoint removal process. This one finished successfully
So I got a "false positive" error. This should be handled better by Checkpoint removal process logic
Marco
Glad to hear you've figured out the actual reason of checkpoint removal failure, and thank you for the feedback. We are already discussing internally within the team how the background checkpoint removal can be improved in terms of issue notifications, retries, and other aspects in next product versions.
MiMaMo
Influencer
Posts: 10
Liked: 5 times
Joined: Feb 27, 2020 3:04 pm
Full Name: Michael
Contact:

Re: Checkpoint removal process info

Post by MiMaMo »

Hello everyone,

I just wanted to report it seems that we are also affected by the bug that since updating VBR to version 12.2. we receive mails with the subject “Failed to remove a checkpoint during a background job on ... “.

I am working with the support team on this issue under the ticket number “Veeam Support - Case # 07430089”.

So far we have been able to improve the situation a little bit with some registry keys for adjusting "S3RequestTimeoutSec", "S3RequestRetryTotalTimeoutSec" and "S3ConcurrentTaskLimit", but the problems are not completely gone.

Unfortunately, it is currently unclear to me how I can see whether a checkpoint removal process is currently running or not.

I also see error messages in the some backup logs, which I never noticed before 12.2:

Code: Select all

06.10.2024 22:07:22 :: Error: S3 error: You did not provide the number of bytes specified by the Content-Length HTTP header.
Code: IncompleteBody
Agent failed to process method {Cloud.CreateCheckpoint}.
______________________________________________________

06.10.2024 21:37:34 :: Agent: Failed to process method {Transform.Patch}: S3 error: You did not provide the number of bytes specified by the Content-Length HTTP header.
Code: IncompleteBody
______________________________________________________
We monitor the S3 instances with Grafana, but unfortunately I am not able to interpret the values shown there correctly.
For example, there is an “S3 API Request Error Rate (4xx)” that spikes regularly and I don't know if this is normal or not.

I seem to lack a lot of background knowledge about when Veeam talks to the S3 instances and how, and what is queried or written and when.

This is just a reference to the running ticket as our environment is complex.
antonio.biasio
Lurker
Posts: 1
Liked: never
Joined: Oct 08, 2024 10:13 am
Full Name: Antonio Biasio
Contact:

Re: Checkpoint removal process info

Post by antonio.biasio »

Can we rest assured that the backup data is intact? Some customers noticed these logs and became alarmed.
From my recovery tests on random points they seemed ok
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Checkpoint removal process info

Post by Gostev » 1 person likes this post

There are no known bugs in checkpoint removal functionality in 12.2, besides the actual removal logic did not change from previous versions. The only difference was separating checkpoint removal process from backup jobs so it does not conflict with other running backup jobs.

Thus any issues are likely specific to certain on-prem S3-compatible storage devices only, and are most likely caused by their performance limitations. Because the registry values mentioned above do not change the product logic, they only increase timeouts and reduce task concurrency in order to reduce storage load at the expense of longer run time.
MiMaMo
Influencer
Posts: 10
Liked: 5 times
Joined: Feb 27, 2020 3:04 pm
Full Name: Michael
Contact:

Re: Checkpoint removal process info

Post by MiMaMo »

@antonio.biasio:

Thank you for the very good question.

The affected backups are retried by Veeam 10 minutes later and then everything seems to be correct according to the logs.
The backup itself runs quick and very fast.

I haven't done a complete check yet, but will now do a sure-backup-job including the check of the complete backup of one of the affected VMs.

These kind of messages are very new for me and I don`t see them in 12.0.
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin » 1 person likes this post

Errors like "You did not provide the number of bytes specified by the Content-Length HTTP header" can potentially indicate issues with the target storage system, which again suggests that the background checkpoint removal activity itself is not the culprit. Kindly continue the investigation with our support engineers to eventually find the true cause of these errors.

As for the recoverability of backup data, the error during checkpoint removal should not cause any problems with backup restore.

Thanks!
MiMaMo
Influencer
Posts: 10
Liked: 5 times
Joined: Feb 27, 2020 3:04 pm
Full Name: Michael
Contact:

Re: Checkpoint removal process info

Post by MiMaMo » 1 person likes this post

The backup data of one of the affected VMs which I tested is healthy. The Sure-Backup-Job was finished with "08.10.2024 16:28:55 Summary: 100% of backup files passed validation successfully". :-)

I will work further on the issues with the support engineers as suggested.
Thank you very much for your input.
johnnytan
Service Provider
Posts: 5
Liked: 1 time
Joined: Oct 09, 2023 3:52 pm
Full Name: Johnny Tan
Contact:

Re: Checkpoint removal process info

Post by johnnytan »

Hi guys, upon the update of Veeam from 12.1 to 12.2, we are experiencing the same issue as well

This is the case id #07451250

Would appreciate the timeline to roll out the hot-fix for this issue soon
massimiliano.rizzi
Service Provider
Posts: 223
Liked: 30 times
Joined: Jan 24, 2012 7:56 am
Full Name: Massimiliano Rizzi
Contact:

Re: Checkpoint removal process info

Post by massimiliano.rizzi »

Hello Community and good day,

first things first, Case # is 07443541.

We are having the same issue with Background checkpoint removal errors after upgrading to v12.2:

==================================================
Image
==================================================

The target backup repository is an Object First Ootbi appliance located in an offsite location.

Unfortunately it looks like there's no progress yet with the support Case even after setting the retention job to 3 pm in order to avoid any possible overlapping backup tasks as instructed by the assigned engineer who is assisting us.

Any help on this matter will be greatly appreciated.

Thanks and Regards,

Massimiliano
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Checkpoint removal process info

Post by Gostev »

johnnytan wrote: Oct 09, 2024 3:28 amWould appreciate the timeline to roll out the hot-fix for this issue soon
Just want to reiterate that as it stands right now, there are no known bugs that need to be "hot fixed" in Veeam. All issue reports seem to be connected to the usage of certain on-prem S3-compatible storage appliances. There was one report for Azure Blob Storage, but the customer later confirmed this was due to network connectivity issues - once Internet access was restored, the error disappeared.
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin »

johnnytan wrote: Oct 09, 2024 3:28 am This is the case id #07451250
massimiliano.rizzi wrote: Oct 09, 2024 9:05 am Hello Community and good day, first things first, Case # is 07443541.
Based on a brief review of the debug logs, the first case appears to be a problem with the backup file itself, something seems off with the storage system, which requires additional analysis from the support team.

The second case is once again about network connectivity issues - when attempting to remove the checkpoint, the backup server failed to connect to the gateway server selected for the specific object storage repository.

So, kindly, continue working with our support team on finding the real cause of the issues experienced.

Thanks!
albertgordojr
Lurker
Posts: 1
Liked: never
Joined: Oct 04, 2024 1:48 pm
Full Name: A Gordo
Contact:

Re: Checkpoint removal process info

Post by albertgordojr »

I started getting the same issue right after my upgrade from 12.1 to 12.2

I'm using wasabi S3 object storage with backup copy job. I'm not sure if SOBR is having the same issue.

I have open ticket for both Veeam and Wasabi and not getting a clear answer. They keep on pointing each other regarding retrieving certificates.

Veeam Support - Case # 07424593 - Failed to remove a checkpoint during a background job

Thanks.
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Checkpoint removal process info

Post by Gostev »

Certificate retrieval problem is a much high level issue. If a certificate cannot be retrieved, then a network connection to object storage cannot be established in principle. This obviously leads to the operation issuing the S3 API call failing (including but not limited to checkpoint removal operation).
admd
Service Provider
Posts: 21
Liked: 5 times
Joined: Feb 22, 2024 1:37 pm
Contact:

Re: Checkpoint removal process info

Post by admd »

admd wrote: Sep 23, 2024 4:27 pm Thanks for the registry key !

We have two clients which have the same errors. One is everyday, the other is more random.
Case #02833866 ongoing.
Support asked me to gather logs from Object Storage side (Wasabi)
I am waiting for the error to appear again.
The random one solved itself.
The one with the error appearing everyday is case # 07443641
First option was to change the hour of when the removal start, with a reg key, but no success.
Second option is to wait for a future patch/update.
fozz33
Novice
Posts: 9
Liked: 2 times
Joined: Feb 10, 2017 12:37 pm
Full Name: Ciaran Foster
Location: Ireland
Contact:

Re: Checkpoint removal process info

Post by fozz33 » 1 person likes this post

Gostev wrote: Oct 09, 2024 10:41 am Just want to reiterate that as it stands right now, there are no known bugs that need to be "hot fixed" in Veeam. All issue reports seem to be connected to the usage of certain on-prem S3-compatible storage appliances. There was one report for Azure Blob Storage, but the customer later confirmed this was due to network connectivity issues - once Internet access was restored, the error disappeared.
This is nonsense - the issue is affecting multiple customers (myself included) and ONLY after they all upgraded to the latest version of Veeam B&R.
And I am using Azure blob storage, for the record.
To suggest we are all having sudden issues relating to our own storage appliances is patently ignoring the obvious cause of our issues - Veeam's latest version of their software.

btw, I've opened a support ticket also...
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Checkpoint removal process info

Post by Gostev » 3 people like this post

You assume the issue did not exist before the upgrade, however it was simply not reported by Veeam before 12.2
I believe the plan is to re-classify the event as warning in future update, because it is not really a critical issue.
tanyababes
Lurker
Posts: 2
Liked: 1 time
Joined: Oct 22, 2024 3:32 am
Full Name: Tanya Legaspi
Contact:

[MERGED] Background Checkpoint Removal feature future enhancements

Post by tanyababes » 1 person likes this post

Post-upgrade to Veeam 12.2, we've started receiving emails regarding failed checkpoints.
during our troubleshooting session last week (Veeam Support - Case # 07416671 -) with ad advanced technical support, we have suggested a couple of improvements to the said feature.

A. identify in the session/email notification which capacity tier the failed checkpoint is on
B. Ability to add the email confirmation indicating that there are no more checkpoints to process if all offload jobs have been successful.
Dima P.
Product Manager
Posts: 14820
Liked: 1772 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Background Checkpoint Removal feature future enhancements

Post by Dima P. » 2 people like this post

Hello Tanya,

Thank you for your feedback and sorry to hear that you've faced issues with background checkpoint removal feature. I'll share your post with the corresponding RnD team!
veremin
Product Manager
Posts: 20677
Liked: 2382 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Checkpoint removal process info

Post by veremin » 1 person likes this post

Often, the problem lies not in the background checkpoint removal process, but in infrastructure issues, including network-related ones. You can discuss this with your support engineer as the investigation progresses.

As mentioned earlier, this does not mean that the new product version introduced a previously non-existent problem; rather, it highlighted it.

In the next minor product version, we plan to improve the notification system by including the underlying error (infrastructure or configuration specific one) in the checkpoint removal session and report. Additionally, we will downgrade the severity of errors in the checkpoint removal process from "failure" to "warning".

Thanks!
Gostev
Chief Product Officer
Posts: 32230
Liked: 7592 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Background Retention/Checkpoint Removal interfering with jobs and restores

Post by Gostev » 1 person likes this post

Gostev wrote: Oct 15, 2024 11:55 am You assume the issue did not exist before the upgrade, however it was simply not reported by Veeam before 12.2
I believe the plan is to re-classify the event as warning in future update, because it is not really a critical issue.
Just wanted to add that for a couple of Azure blob storage users whose provided log package included logs from before the upgrade, on my request QA has specifically confirmed from those logs that the infrastructure issue existed before the upgrade and affected many different VBR operations even before the upgrade. In both cases the root cause was due to continuous issues with obtaining object storage certificate, which prevents establishing connection to object storage altogether. Such issues are often caused by "smart" IDP/IPS firewalls.
Bejaminlee
Novice
Posts: 8
Liked: never
Joined: Jun 18, 2024 7:17 pm
Contact:

[MERGED] Checkpoint removal process info

Post by Bejaminlee »

Since the last patch (v12.2.0.334), we are having an ongoing erratic issue with the processing of normal jobs that write to Backblaze B2 and which restore from them where there are contention issues, file locks, and timeouts of the background retention process and the checkpoint removal process. These occur unpredictably and require a Veeam B&R server reboot to kill all running processes and restart everything before the job or restore can complete successfully.

We are getting excellent support from the engineers working on the cases, but we are having trouble replicating the problem for active troubleshooting. I was hoping somebody else might have encountered something similar and found a way to duplicate it so we could nail down the circumstances that are causing the issue and get it resolved. On the engineer's recommendation, we have configured a concurrent task limit for the B2 repos to eliminate issues of overburdening backblaze's write cache and similar resources which might lead to erratic timeouts and process issues. But because the issue was unpredictable, it is hard to say whether this resolved it.

Anybody run into issues with the background retention and related processes interfering with your jobs or restores?

Case #07456489
Case #07464585
Post Reply

Who is online

Users browsing this forum: No registered users and 18 guests