Discussions related to using object storage as a backup target.
Post Reply
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Case number 04640714

Post by theviking84 »

I only opened this case this morning, and I'm sure no one has looked yet. I was just wondering if someone here might have some info I could use to try to resolve quicker or any guidance on what this might be.

I have a single SOBR configured in veeam, latest version. It is extended with an Amazon S3 bucket with object lock. I ran a backup job yesterday and it worked great and copied to the s3 bucket as well as it should.

Ever since that initial run, every single "sobr tiering" job which runs each 4 hours in the background has failed. It says "object storage repository clean up error:resolve: error host not found (non-authoritative) try again later (maybe a dns thing? but if so the veeam server has internet access so it should be able to reach s3 bucket... or is this tiering process leveraging the performance extent to try to reach out to the bucket? I know it does that on actual backup offloads to it... Even if that is the case, that vm does not have internet but it does have dns to s3 and full access to be able to still copy the backups to it, it just takes an internal aws vpc route. That worked just fine for the actual backups. Just this repo clean up part is failing...

I put a little bit of log output below. you can see the errors about failing to connect to target endpoint and the resolve error etc.

[logs removed by moderator]
Gostev
Chief Product Officer
Posts: 31707
Liked: 7214 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Case number 04640714

Post by Gostev »

Please don't post debug logs in the forum posts, as requested when you click New Topic. Thanks!
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 »

Sorry Gostev, I noticed the message to include case but didn't click on that to read all the rest of the notices. Will remember that.
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 »

Some more points on this.. In testing with the support rep, we find that backup jobs work OK and even the sobr offload to the s3 bucket works too. It just seems that each time there is a sobr tiering job that includes the step of "object storage cleanup" or similar, this is where it throws the error... Any ideas? I'm waiting for support to analyze the logs that were given.

I can't see a dns or network issue if backups are totally working... Why would only this cleanup step have the issue?
Gostev
Chief Product Officer
Posts: 31707
Liked: 7214 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Case number 04640714

Post by Gostev »

I would rather suspect some invalid bucket setting. It makes no sense that DNS or network issues trigger only at the deletion time, when all that really changes is an API command issued through the very same S3 connection.
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 »

The bucket was created with object lock enabled and versioning on. The object lock default retention was left off which was default as veeam handles it when uploading I believe.

I verified if I try to delete backups from capacity tier it fails and correctly tells me the date it is immutable until.

I've noticed if I do a "rescan" of the sobr then the performance tier repo is fine but the s3 bucket does not synchronize and gives a warning.

So the constant tiering cleanup fail and rescan fails on bucket. It does seem related to just the s3 bucket but the settings above were pretty simple.

I just now switched out the iam policy that I used from https://www.veeam.com/kb3151 and instead I gave the programmatic user full s3 to one bucket (this one in question) trying to test if it was something wrong in that policy, same result though....

Does the cleanup step happen all from veeam b&r server direct to the bucket? Or is the performance tier extent leveraged for that? (as it seems to be for the actual backup file offload portions)

The veeam b&r has full internet access while the linux ec2 performance extent has full connection to s3 dns and ips but not full internet. I wondered if that could be a key here but it seems like this action is more from b&r server direct to bucket. It is always when there is "0 files to move and 0 files to copy" and then cleanup fail.

I'm not sure what to do because now I have data in an immutable bucket. I can try to make another bucket but if it is just a matter of ensuring that object lock is on and versioning when you create it, then I don't see any difference...
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 »

As a test, I made another bucket, this time no object lock or immutability, and I made and used a repo that is right on the local disk of the veeam B&R server.. I made a new test sobr and the cleanup step at the end of the tiering does not fail here... So it seems either the bucket or the linux performance extent may be at fault... I don't really suspect the bucket only because it is so simple to set up and the immutability is the only difference on that. I feel it is more likely to be the linux extent as it has no internet access, but has a route to S3 in my availablity zone, so it is able to use the buckets fine.

It's just so weird that backups work fine and even the offload of backups into the bucket but only this cleanup is failing. The logs just don't seem to give enough detail. I am really stuck here as I need to move all the rest of our backups into this original sobr but can't really proceed until that cleanup is working.
veremin
Product Manager
Posts: 20353
Liked: 2285 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Case number 04640714

Post by veremin »

I've asked QA team to check the case. Will let you know, once I have more information. Thanks!
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 » 2 people like this post

An fyi, I *think* I have gotten this fixed. The environment is kind of elaborate and uses aws ec2 repo and s3 bucket and vmware cloud on aws. This appears to have been an aws security rule issue where only when traffic from the repository itself going back from linux into the veeam server was requested then it would fail.

I would still like to see if qa or techs can get more detailed logs that might have explained what was going on as the error message sounds more like dns when it was not related to dns at all.

I will be reviewing for a while but right now it is working.
veremin
Product Manager
Posts: 20353
Liked: 2285 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Case number 04640714

Post by veremin »

Regarding DNS-specific error:

- at first we try to establish connection, using IP address
- if the attempt fails, we don't trigger any warning in user interface (cause the issue might be temporary or caused by IP lease expiration), instead
- we try to establish connection, using DNS name
- if the attempt fails, we trigger corresponding error (ERR |resolve: Host not found)

So, the information provided might not be complete, but it is definitely correct.

Anyway, thanks for raising this, we will think how experience might be improved in future.
theviking84
Expert
Posts: 119
Liked: 11 times
Joined: Nov 16, 2020 2:58 pm
Full Name: David Dunworthy
Contact:

Re: Case number 04640714

Post by theviking84 » 1 person likes this post

Thank you. I will remember this for future set ups.
Post Reply

Who is online

Users browsing this forum: No registered users and 11 guests