persistent offload error to capacity tier (backblaze)

selva · Dec 21, 2020 5:07 am

Case: #04555631

I just found out capacity tier offload of one of my backup jobs has been failing for several days now. The SOBR has a one performance and one capacity tier (backblaze) and has been working fine for a long time. A month ago I changed the capacity tier to use a bucket with object lock when B2 acquired immutability support. Now this backup chain seems to have a reached a stage where it has to delete some files in the object storage which errors with a "DeleteMultipleObjects" failed message (see below).

Immutability is set to 30 days in the B&R object repository and backups are copied to capacity tier as they are created. "keep only the last version" is enabled on the bucket at backblaze end as was the recommended practice prior to immutability support in B2. Has that changed? The error (below) refers to some invalid version id.

Wonder why no email notification is received for this error. In the B&R console selecting "Jobs" and "HISTORY" shows no error (all jobs successful) but selecting "Storage Management" shows the offload errors, so I failed to notice this for several days.

Error seen in the statistics window:

Code: Select all

Starting performance tier offload 	
0 backups will be moved to the capacity tier 	
12 backups will be copied to the capacity tier 	
Processing Alt Backup VMs Error: DeleteMultipleObjects request failed to delete object [Veeam/Archive/Veeam/2e..elided...9778ff/00000000-0000-0000-0000-000000000000/blocks/5d0b...elided...1a4c/24928....elided...9054c.00000000000000000000000000000000.blk] and [23] others, error: NoSuchVersion, message: 'Invalid version id specified'  	01:19
Object storage repository cleanup 	00:12
Job finished with error at 12/20/2020 8:10:43 PM

Post by **HannesK** » Dec 21, 2020 6:33 am this post

Hello,
just to be sure... your backup job retention is longer than 30 days? What did you configure?

Please remember that the forum does not replace support... posting on the forums a few hours after opening a case keep case processing that the same speed

Best regards,
Hannes

selva · Post by **selva** » Dec 21, 2020 4:13 pm this post

Thanks for asking: retention is 30 restore points (not days) which takes 6 weeks (one point per week day) plus 4 weekly and 12 monthly selected in GFS. It doesn't look like any backup is ready for deletion -- there are a total of only 23 points in the object storage right now including 6 monthlies and 4 weeklies.

PS: I know forum does not replace support and I'm not trying to speed up anything. Based on past experience, I think some here may have good insights that may point me in the right direction.

selva · Post by **selva** » Dec 22, 2020 3:00 am this post

Anyone knows a way to escalate a support request? My only offsite copy is now falling behind and I'm getting nervous.

Natalia Lupacheva · Dec 22, 2020 6:11 am

Hi @selva,

you can escalate your support case as it's described here.

Thanks!

selva · Post by **selva** » Dec 22, 2020 10:28 pm this post

My first time raising a support case, so may be I'm over-reacting... Support engineer sends me a link to zadara storage and asks me to follow it to "set the ID for S3". That's it -- no clue what "the ID" they are referring to or what am I supposed to do. My issue is that SOBR offload to backblaze has been failing for a while now for one particular job.

Post by **Gostev** » Dec 22, 2020 10:57 pm this post

You can always ask your support engineer directly for clarifications, or even suggest a live session to have them guide you through the required modifications live. They will be happy to do this!

selva · Post by **selva** » Dec 24, 2020 4:59 am this post

Let me try again after having slept over my irritation: I reported an SOBR offload error that has stopped copying of several recent restore points to capacity tier (backblaze). After an initial exchange over logs and more details, and a few emails back and forth this is all what I receive in a 2 line email:

A link to some generic docs on use of aws cli utility (which I'm well aware of) for manipulating object lock, saying that his Tier 2 team recommended that I follow those docs to "set the ID of S3".

That's it. What ID? I never asked how to set any ID and why would I want to set any ID when I'm trying to get help on how to solve the offload error. Considering the usage "the ID", probably he confused my case for someone else's... Well, its not easy to keep up the quality of support with covid raging, and on top of that its holiday season... I get that.

Anyway, I have escalated the support request and is getting someone look at the problem over a remote session. Hope this gets somewhere.

pirx · Post by **pirx** » Dec 24, 2020 6:32 am this post

My experience with S3 immutability is that it can taker longer than you expect. And if this is the case there is no way to end immutability manually.

selva · Post by **selva** » Dec 24, 2020 7:36 pm this post

Some followup:

The remote session with support could not find the root cause of the error nor fix it. I'm told there are many folks facing exactly the same error and the case is now passed on to Tier 2.

In the mean time my offsite copy continues to fall behind --- now 15 restore points behind.

As for the actual immutability period, there is an API to query it, so trying to delete something that is immutable and not knowing why its not succeeding can't happen. And, the error I reported is not what one would get from an S3 API call trying to delete an immutable object. Instead, the error shows B&R is trying to delete non-existent objects which is worry some, to say the least.

I repeatedly get asked about whether immutability is set on the bucket, a question that shows a lack of familiarity with backblaze.
Backblaze does not support a default bucket level immutability setting unlike amazon S3, for example. At the web interface, the only immutability related action one can do is to enable the object lock feature during bucket creation. That's it. And, at the API level only three lock related commands are supported currently.

1. GetObjectLockConfiguration <-- returns whether lock enabled on the bucket -- unlike amazon S3, there is no default retention period in the reply as its not supported. There is no corresponding "put" call.
2. PutObjectRetention <-- set the immutability period on an "object"
3. GetObjectRetention <-- get immutability period on an "object"

Amazon supports a default retention period for all objects in a bucket which may be the reason why one gets this question. No such thing in backblaze.

That means its hard to do anything inconsistent while setting up a backblaze bucket for use with immutability on Veeam. There is one peculiar feature in backblaze though -- by default it keep all versions of files so its important to change the lifecycle setting to "keep only the last version". This should not cause any errors --- only an increased storage cost as space doesn't get released on deletion of objects otherwise.

jdombeck · Dec 26, 2020 11:52 pm

I am having the exact same problem you are with the same error message, and with offsite backups in B2 falling behind for one vm. Curious circumstance for me is that the other three vm's in the same backup job are not exhibiting this problem. Not getting very far with support either, except for a suggestion to disable the existing backup job and create a new one from scratch. I haven't done this yet but it will take a while to see if this works anyway for the retention period to elapse. I hope the many folks who are dealing with this error can be pointed to a solution soon.

selva · Post by **selva** » Dec 27, 2020 5:19 am this post

Starting over from scratch is a terrible idea until the root cause is understood. Else, we could very well land in the same spot in future. I think it would be irresponsible to suggest that unless they know what is causing the error. And, if there is no fix, we'll have to conclude that use of backblaze with immutability in Veeam B&R is not a production ready feature.

Anyway, the error may not be related to retention and immutability: at least in my case, some of the files B&R is trying to delete do not seem exist.

lethallynx · Post by **lethallynx** » Dec 28, 2020 8:20 am this post

Yep I am having exactly the same issue. Logged with support 04509174
It’s been escalated to tier 2 support and I’ve been told the devs are looking into it currently.

Dec 28, 2020 8:24 am

We have been working on implementing object storage with immutability since v10 was released in January. We have tried multiple vendors (all of them included on the "Supported with Immutability" list) with overall limited success. We have support cases logged and interacting with both Veeam R&D and the R&D of object storage vendors. This project has taken up much more of my time than I would like it to, not to mention the insane hardware investments I had to convince my business partners to make for us to perform any meaningful testing of object storage.

Without pointing any fingers, I generally think that the industry has a lot to learn about using the S3 protocol on the application side. Using object storage does not magically solve storage problems such as latency, IOPS requirements, rebuild times, metadata updates, etc.

Object locking support is new for everyone. Because of Veeam's vast market share, their support for immutability has pushed many storage vendors to include the feature overnight. Being first to market has been more important than solving the implementation correctly. Tests seem to have been performed at a tiny scale or for short periods, so DeleteMultipleObjects or PutObjectRetention requests are never reached.

We have worked through multiple issues where requests that return HTTP 200 on AWS S3 returns an HTTP 400 response, which in turn causes the offload process to fail. Without knowing Backblaze, the error message you have pasted above seems related to such an issue.

Most of the other issues we have experienced only show up for large restore points above approximately 1 TB. This is caused by the number of objects written by Veeam. Large prefixes (folders) in object storage are a challenge for all the vendors we have worked with. It significantly impacts the rebuild time when replacing disk drives or power cycling nodes. The only mitigation against this problem is to increase the block size for the backup job. Unfortunately, this is only practical for greenfield deployments, and it is only a ~4x optimization for something that probably requires >100x improvement.

I want to stress again that we are making good progress, but it feels a lot more like a development project than a conventional support case.

Dec 28, 2020 8:26 am

Did you followed exactly this guide?
https://help.backblaze.com/hc/en-us/art ... Cloud-Tier

Did you enable immutability on the same bucket with Veeam data already present? (shouldn´t be possible but let´s discuss).

As well how did you switch from non immutable to immutable. Do you use another SOBR Repository and migrate adata accross?

Dec 28, 2020 8:38 am

Maybe it is the same situation as here: https://github.com/minio/minio/pull/11091
I will discuss with Backblaze...

Post by **gtelnet** » Dec 28, 2020 2:17 pm this post

Receiving the same error here, plus a couple more. Veeam case # 04515058 - Backblaze case # 627761

Backblaze had us use Cyberduck utility to browse our bucket (too many files to browse with their web ui).

Using the Cyberduck utility, we've confirmed that the objects do not exist in the bucket for this first error:

REST error: 'S3 error: Bucket name does not have file: path/objectname.blk
Code: InvalidArgument', error code: 400
Other:

For this Invalid version id specified error, we've confirmed the objects do exist:

Processing Error: DeleteMultipleObjects request failed to delete object [path/objectname.blk] and [23] others, error: NoSuchVersion, message: 'Invalid version id specified'

This one seems like a hardcoded limitation in Backblaze that Veeam needs to account for:

Failed to offload backup Error: The name can be no more than 260 characters in length

We have two sobrs using the same bucket, but each in a different folder (Veeam said this was fine when we first setup Backblaze in October).

Immutability for 3 days was enabled as part of the initial deployment. Set to copy immediately and move after 14 days.

We didn't see the instructions to disable versions at first and unchecked that option in the bb web ui a few hours after we started offloading.

We have rescanned the sobr several times. It often says it has updated a few jobs, but it has not reduced the hundreds of error messages.

Case was escalated a week or so ago with both Veeam and Backblaze. Backblaze has offered to join a conference call with Veeam, which may have occurred without my knowledge. I'm confident Veeam will fix the issue but would greatly appreciate better communications on the status. I've run several restores from bb bucket for servers that are in the jobs giving these errors, and all worked correctly, so I'm hopeful that there is not a data integrity issue.

The size of the bucket as reported by bb is 273.5TB and only 196.3 (two sobrs reporting combined). I'm hoping that this is because Veeam currently can't remove all of the old objects and that these numbers will be more inline with each other once this issue is resolved, so that my monthly fees with Backblaze aren't higher that what we actually require.

lethallynx · Dec 28, 2020 3:34 pm

Andreas Neufert wrote: ↑Dec 28, 2020 8:26 am Did you followed exactly this guide?
https://help.backblaze.com/hc/en-us/art ... Cloud-Tier

Did you enable immutability on the same bucket with Veeam data already present? (shouldn´t be possible but let´s discuss).

As well how did you switch from non immutable to immutable. Do you use another SOBR Repository and migrate adata accross?

Yep we followed that guide exactly.

It was our first time ever setting up S3 storage with the SOBR so it was a brand new bucket. So no data already existed and immutability was enabled before we configured it in Veeam.

We actually waited for the digest email from Gostev that said Backblaze had added immutability and configured it a few weeks after.

Dec 28, 2020 4:29 pm

I just wanted to chime in and say I’ve been getting similar issues. I need to get a support ticket going and I’m not in a good spot to lookup specifics right now, but I’d be happy to add more info if that’s helpful for my scenario.

Post by **gtelnet** » Dec 30, 2020 8:36 pm this post

Just heard from a Backblaze Senior Engineer - will post any changes we need to make once they confirm but it sounds like a good update!

I think we may have a resolution for this. It turns out when using immutability with Veeam it is expecting the bucket to manage the versioning for the various objects. The good news is you should be able to enable it on the fly and it should resolve future issues.

Dec 31, 2020 8:55 am

I think this one will not help with the above error.

With Immutability enabled, we are fully managing the deletion of objects.

When you do not enable Immutability but have versioning enabled on the bucket, then you do need to enable life cycle rules on the bucket to fully remove the Veeam deleted objects. This is needed as Veeam would delete an object but the bucket would just create a new version of the object, increasing your storage costs.

We are working with Backblaze on this issue. It looks like pretty much the same situation as the Minio one shared above.

tgx · Jan 04, 2021 4:11 pm

I just wanted to add that I began receiving this error on January 3. The exception is this is not to a BB bucket but
an internal NAS device. I also have BB buckets that are not exhibiting the problem. These backups have been running
and unchanged for many months.

The errors are:

Error
Object storage cleanup failed: DeleteMultipleObjects request failed to delete object....NoSuchVersion, message:
'Invalid version id specified'

I wouldn't have looked in the logs if it hadn't been for reading a digest of the forums and seeing this issue being discussed.
No alerts this was happening. I thought I would relate the observations since it appears this error may not be limited to BB
buckets.

Jan 04, 2021 4:44 pm

Thanks for keeping an eye on the forum's digest!

Based on what Andreas shared above, it looks like your NAS vendor also did not follow the S3 API specification closely. But let's wait for Andreas to confirm this 100% with Backblaze before you request the same hot fix from your storage vendor.

tgx · Post by **tgx** » Jan 04, 2021 5:01 pm this post

Thanks for the reply Gostev, however I believe the error was caused by myself.
First, the error was not with the NAS device but was a second part of a job that also backs up
to BB. The error was, in fact, coming from the BB portion. I do recall having read the digest
and having looked at my BB bucket back on the 29th. After looking at it, I had changed the BB bucket lifecycle settings from 'keep all versions'
to 'Keep prior version for this number of days', so , in fact, I caused the problem. I was going to delete my
post but you had already jumped on it. Sorry.

Post by **Gostev** » Jan 04, 2021 5:06 pm this post

Ah, I misunderstood your post then: I thought you were using NAS with S3 interface!
Thanks for confirming, I will remove our exchange later.

selva · Post by **selva** » Jan 04, 2021 5:09 pm this post

My escalated case is still under investigation but there is some good news: they think this is due to a mismatched API response from backblaze when attempting to delete an object that has been already deleted. And they are working with backblaze to find a good way to fix it.

Hopefully the developers are on the right track and we'll get a fix for this soon. Some of my offloads have fallen too far behind now to be comfortable. Eagerly waiting for a solution.

Jan 05, 2021 10:58 pm

Nilay from Backblaze here.

We have studied various customer reports, some from this forum and others that have been made directly to our support teams. Our theory is the customers reporting this problem have configured CloudTier using Backblaze B2 with immutability enabled AND have also enabled Lifecycle rules to " Keep only the last version of the file." (This was the theory that Andreas floated on Dec. 31, 2020 above, see: object-storage-f52/persistent-offload-e ... ml#p396270)

If a customer wants to tier backups to Backblaze B2 WITHOUT immutability enabled (not a recommended configuration - customers should be enabling immutability to protect against ransomware attacks) - they must configure Lifecycle rules to ensure old object versions are pruned when replaced by Veeam by changing the setting to "Keep only the last version of the file."

If a customer wants to tier backups to Backblaze B2 WITH immutability enabled, they must NOT use Lifecycle rules. Veeam will manage the deletion of object versions as necessary. Lifecycle rules should remain at the default, "Keep all versions of the file."

In the case customers tier backups to Backblaze B2 WITH immutability enabled AND Lifecycle rules to delete object versions, Lifecycles rules are most likely deleting object versions after their object lock date allows. In this case, customers should immediately disable Lifecycle rules by reverting the setting to "Keep all versions of the file." Secondly, because some objects that Veeam is expecting to exist may have been removed by Lifecycle rules, I'm told by Veeam that customers should "Rescan" the Scale-out Repositories, which will restore the previously deleted objects in B2 from the performance tier.

As for the HTTP errors on subsequent DeleteObjectVersion operations that appear in the Veeam logs, this is caused due to an inconsistency between AWS S3's DeleteObjectVersion and the corresponding API in Backblaze B2. An issue has been filed with Backblaze engineering [DEV-6848] to fix this inconsistency.

-- Nilay Patel
VP of Sales and Solution Engineering
Backblaze

Jan 06, 2021 11:59 am

I'm told by Veeam that customers should "Rescan" the Scale-out Repositories, which will restore the previously deleted objects in B2 from the performance tier.

This can only be the case when the data is still on the performance tier. There are some tests running at the moment with that configuration and let´s wait for the feedback there.

Post by **sfirmes** » Jan 06, 2021 2:53 pm this post

@Andreas Neufert is correct. The data needs to exist in the Performance Tier in order that a SOBR offload job can synch with the Capacity Tier. But only data that is deleted from the Capacity Tier will be recopied from the Performance Tier.

So if some files/folders are manually deleted from the object storage via a gui, s3 browser, or any other mechanism which isn't part of VBR the SOBR offload will not recopy the data. If you go to the VBR console and remove the backup job from the object storage, then the next SOBR offload will recopy the data to the Capacity Tier.

selva · Jan 06, 2021 4:12 pm

I'm afraid we are getting sidetracked here.

The error and issue under discussion here is not about user deleting files in the bucket. At least in my case that started this thread, the bucket is exclusivey used by Veeam B&R and no files have ever been deleted by directly accessing the bucket by any means.

Now backblaze is recommending (as per Nilay Patel's post above) to leave the lifecycle rule at the default of "keep all versions". I will surely try this this, but that still does not explain the error we are facing. I do have "keep only latest version" selected in backblaze as was the recommendation that time, but that should only cause automatic cleanup of only "deleted" objects or deleted versions. Why would an object not deleted by Veeam be removed by backblaze? If its Veeam is removing files that it should not, why is it doing that? These are the real issues, and not whether Veeam will put back a file that was accidentally deleted by the user by directly accessing the bucket.

Further, I have already done rescan several times starting from the time the error was first noticed weeks ago, and that has not helped anything. Rescan always succeeds with no error or missing files reported.

R&D Forums

persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Re: persistent offload error to capacity tier (backblaze)

Who is online