Discussions related to using object storage as a backup target.
Post Reply
fabio.pasetti
Service Provider
Posts: 8
Liked: 1 time
Joined: Dec 06, 2022 8:39 am
Full Name: Fabio
Contact:

Veeam and Ceph: a real cool story

Post by fabio.pasetti » 1 person likes this post

Hi everyone, I'm Fabio and I'm working with Veeam and Ceph for about 2 weeks and I found some interesting thing that I suppose could interest everyone :D

Let me start from the beginning!

I have a 2.7PB cluster Ceph (Pacific release) which expose the object storage using the integrated RadosGW, so I can use it as the offload storage for the Veeam's backup jobs.
As radosgw frontend I tried civetweb and beast (with the same results).

I'm using the immutability API function (and at the very beginning I re-compiled Ceph to fix by myself the issue with the timestamp format... but my merge request is not the one which the community integrate in the official release of Ceph.. Same issue, similar solutions, but mine was not so elegant, maybe :roll: ).

Almost everything works fine, even if Ceph and the S3 API implementation is not certified by Veeam, apart for an issue that Veeam support could not solve and I'm trying to solve by myself.

Let me explain better:
- offload to s3 works great so I can put data in buckets with the right expiration date
- list of files in buckets works great so I can recover data from buckets

The only problem is related with the multiple object delete request that Veeam at the end of the job. If I run the job manually, it could delete (better, tag to delete) the oldest files but when the scheduler runs the same job, it couldn't delete the oldest files cause "unknown error".. During my investigation I found that the "delete" operation is an API put request which contain a list of maximum 1000 (by Veeam default) path to sign as deletable after the immutability expired.

I tried with less files in the bulk request (playing with the Veeam's registry keys) but the error is always the same: unknown error [and the same timeout].

I know that this is a Ceph issue (better: a rados gateway issue..) and I'm pretty sure that someone hit the same issue and maybe had solve it a lot of time ago..well, I'm here to ask your help :D

I can produce a lot of logs and metrics and logs but I thing the issue is related with some settings of beast (or civetweb but I'm using beast at this moment)..

Thanks to everyone and sorry for my bad english..

Fabio :mrgreen:
sfirmes
Veeam Software
Posts: 238
Liked: 120 times
Joined: Jul 24, 2018 8:38 pm
Full Name: Stephen Firmes
Contact:

Re: Veeam and Ceph: a real cool story

Post by sfirmes »

@fabio.pasetti welcome to the forums and thanks for the question.

I have tested Veeam working with Ceph Octopus, Pacific, and now Quincy (stable). I have used the "official" releases of Ceph running on Ubuntu.

We also have several alliance partners who use Ceph as part of their solutions who work with Veeam and have passed the Veeam Ready Object and Veeam Ready Object with Immutability testing. The Veeam Ready Object testing does test the deletion of over 4 million objects from the object storage target. So what you are trying to do should be working without issues.

What version of VBR are you using?

Also you mentioned you contacted Veeam Support. Do you have a case# that we can look at?
Senior Solutions Architect, Product Management - Alliances @ Veeam Software
Andreas Neufert
VP, Product Management
Posts: 6742
Liked: 1407 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Veeam and Ceph: a real cool story

Post by Andreas Neufert »

Hello Fabio.
Do you run Ceph 14.2.22 or later, 15.2.14 or later, or 16.2.5 or later?
There is as well a bug related to the wrong date format used by default in Ceph.
https://tracker.ceph.com/issues/51327
fabio.pasetti
Service Provider
Posts: 8
Liked: 1 time
Joined: Dec 06, 2022 8:39 am
Full Name: Fabio
Contact:

Re: Veeam and Ceph: a real cool story

Post by fabio.pasetti »

Hi @sfirmes thank you for your reply!

Uh, you wrote an interesting thing about tests! I mean you're right, if there is a tool to test the storage I can use it! I don't know how can I obtain that test, but I'm writing to my contact in Veeam asking if it's possible to test my Ceph with them.

About the case, it's the #05630213 (and it's closed right now because my colleague prefer to start from scratch in another case.... :roll: ) and it contains almost all the entire story.

Hi @Andreas Neufert I'm using Ceph 16.2.10 and the but related to the wrong date format is actually solved on my environment, I'm sure that the issue is related with something else, but thank you very much for the link! :mrgreen:

Thank you!
sfirmes
Veeam Software
Posts: 238
Liked: 120 times
Joined: Jul 24, 2018 8:38 pm
Full Name: Stephen Firmes
Contact:

Re: Veeam and Ceph: a real cool story

Post by sfirmes » 1 person likes this post

@fabio.pasetti the testing I referred to is Veeam Ready Program which is a program offered to our Technical Alliance Partners.

I am doing some testing today with v11 and v12 beta3 using the Quincy build (17.2.5) and will let you know if I see the same issue you are encountering. I only have a 4TB cluster, but that will be enough to test the deletion of millions of objects both manually and automatically.

If I have any issues, I will update this thread.
Senior Solutions Architect, Product Management - Alliances @ Veeam Software
fabio.pasetti
Service Provider
Posts: 8
Liked: 1 time
Joined: Dec 06, 2022 8:39 am
Full Name: Fabio
Contact:

Re: Veeam and Ceph: a real cool story

Post by fabio.pasetti »

Great, thank you very much Steve! If you need to know something else about my setup, I can send you the radosgw config or anything else.

Thank you a lot
fabio.pasetti
Service Provider
Posts: 8
Liked: 1 time
Joined: Dec 06, 2022 8:39 am
Full Name: Fabio
Contact:

Re: Veeam and Ceph: a real cool story

Post by fabio.pasetti »

Hi everyone,
just to update with some consideration:

- I'm looking for the errors in the radosgw debug logs and I can find that the "beast" frontend sometimes crash itself and the requests disappeared with the process

- osds didn't report errors

I suppose that my issue is related with some mods on the radosgw trying to enhance performance so, starting from today, I'm trying to move half of my radosgw to default configuration so tonight we can use them.

Thank you,
Fabio
sfirmes
Veeam Software
Posts: 238
Liked: 120 times
Joined: Jul 24, 2018 8:38 pm
Full Name: Stephen Firmes
Contact:

Re: Veeam and Ceph: a real cool story

Post by sfirmes »

Thanks for the update @fabio.pasetti. My setups always use the base code that I get from the linux distros and so far they have been solid.

Looking forward to your next update.
Senior Solutions Architect, Product Management - Alliances @ Veeam Software
randyodonnell
Service Provider
Posts: 1
Liked: never
Joined: Nov 27, 2022 5:07 pm
Full Name: Randy O'Donnell
Contact:

Re: Veeam and Ceph: a real cool story

Post by randyodonnell »

Sorry to jump on this thread, but we are having the exact same issue with Veeam 12 (RTM), and Ceph Quincy 17.2.5. Everything seems to go fine, until it needs to clean up restore points, and throws erros on the multiple object delete. Was anything ever figured on what causes this?

Thanks,
Randy
Mildur
Product Manager
Posts: 8641
Liked: 2270 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Veeam and Ceph: a real cool story

Post by Mildur »

Hi Randy

Welcome to the forum.

The provided Case was closed, therefore I don't see what the solution was.
Please open your own case if you want to have it analyzed.

Best,
Fabian
Product Management Analyst @ Veeam Software
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 14 guests