-
- Service Provider
- Posts: 8
- Liked: 1 time
- Joined: Dec 06, 2022 8:39 am
- Full Name: Fabio
- Contact:
Veeam and Ceph: a real cool story
Hi everyone, I'm Fabio and I'm working with Veeam and Ceph for about 2 weeks and I found some interesting thing that I suppose could interest everyone
Let me start from the beginning!
I have a 2.7PB cluster Ceph (Pacific release) which expose the object storage using the integrated RadosGW, so I can use it as the offload storage for the Veeam's backup jobs.
As radosgw frontend I tried civetweb and beast (with the same results).
I'm using the immutability API function (and at the very beginning I re-compiled Ceph to fix by myself the issue with the timestamp format... but my merge request is not the one which the community integrate in the official release of Ceph.. Same issue, similar solutions, but mine was not so elegant, maybe ).
Almost everything works fine, even if Ceph and the S3 API implementation is not certified by Veeam, apart for an issue that Veeam support could not solve and I'm trying to solve by myself.
Let me explain better:
- offload to s3 works great so I can put data in buckets with the right expiration date
- list of files in buckets works great so I can recover data from buckets
The only problem is related with the multiple object delete request that Veeam at the end of the job. If I run the job manually, it could delete (better, tag to delete) the oldest files but when the scheduler runs the same job, it couldn't delete the oldest files cause "unknown error".. During my investigation I found that the "delete" operation is an API put request which contain a list of maximum 1000 (by Veeam default) path to sign as deletable after the immutability expired.
I tried with less files in the bulk request (playing with the Veeam's registry keys) but the error is always the same: unknown error [and the same timeout].
I know that this is a Ceph issue (better: a rados gateway issue..) and I'm pretty sure that someone hit the same issue and maybe had solve it a lot of time ago..well, I'm here to ask your help
I can produce a lot of logs and metrics and logs but I thing the issue is related with some settings of beast (or civetweb but I'm using beast at this moment)..
Thanks to everyone and sorry for my bad english..
Fabio
Let me start from the beginning!
I have a 2.7PB cluster Ceph (Pacific release) which expose the object storage using the integrated RadosGW, so I can use it as the offload storage for the Veeam's backup jobs.
As radosgw frontend I tried civetweb and beast (with the same results).
I'm using the immutability API function (and at the very beginning I re-compiled Ceph to fix by myself the issue with the timestamp format... but my merge request is not the one which the community integrate in the official release of Ceph.. Same issue, similar solutions, but mine was not so elegant, maybe ).
Almost everything works fine, even if Ceph and the S3 API implementation is not certified by Veeam, apart for an issue that Veeam support could not solve and I'm trying to solve by myself.
Let me explain better:
- offload to s3 works great so I can put data in buckets with the right expiration date
- list of files in buckets works great so I can recover data from buckets
The only problem is related with the multiple object delete request that Veeam at the end of the job. If I run the job manually, it could delete (better, tag to delete) the oldest files but when the scheduler runs the same job, it couldn't delete the oldest files cause "unknown error".. During my investigation I found that the "delete" operation is an API put request which contain a list of maximum 1000 (by Veeam default) path to sign as deletable after the immutability expired.
I tried with less files in the bulk request (playing with the Veeam's registry keys) but the error is always the same: unknown error [and the same timeout].
I know that this is a Ceph issue (better: a rados gateway issue..) and I'm pretty sure that someone hit the same issue and maybe had solve it a lot of time ago..well, I'm here to ask your help
I can produce a lot of logs and metrics and logs but I thing the issue is related with some settings of beast (or civetweb but I'm using beast at this moment)..
Thanks to everyone and sorry for my bad english..
Fabio
-
- Veeam Software
- Posts: 291
- Liked: 139 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: Veeam and Ceph: a real cool story
@fabio.pasetti welcome to the forums and thanks for the question.
I have tested Veeam working with Ceph Octopus, Pacific, and now Quincy (stable). I have used the "official" releases of Ceph running on Ubuntu.
We also have several alliance partners who use Ceph as part of their solutions who work with Veeam and have passed the Veeam Ready Object and Veeam Ready Object with Immutability testing. The Veeam Ready Object testing does test the deletion of over 4 million objects from the object storage target. So what you are trying to do should be working without issues.
What version of VBR are you using?
Also you mentioned you contacted Veeam Support. Do you have a case# that we can look at?
I have tested Veeam working with Ceph Octopus, Pacific, and now Quincy (stable). I have used the "official" releases of Ceph running on Ubuntu.
We also have several alliance partners who use Ceph as part of their solutions who work with Veeam and have passed the Veeam Ready Object and Veeam Ready Object with Immutability testing. The Veeam Ready Object testing does test the deletion of over 4 million objects from the object storage target. So what you are trying to do should be working without issues.
What version of VBR are you using?
Also you mentioned you contacted Veeam Support. Do you have a case# that we can look at?
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- VP, Product Management
- Posts: 7074
- Liked: 1507 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Veeam and Ceph: a real cool story
Hello Fabio.
Do you run Ceph 14.2.22 or later, 15.2.14 or later, or 16.2.5 or later?
There is as well a bug related to the wrong date format used by default in Ceph.
https://tracker.ceph.com/issues/51327
Do you run Ceph 14.2.22 or later, 15.2.14 or later, or 16.2.5 or later?
There is as well a bug related to the wrong date format used by default in Ceph.
https://tracker.ceph.com/issues/51327
-
- Service Provider
- Posts: 8
- Liked: 1 time
- Joined: Dec 06, 2022 8:39 am
- Full Name: Fabio
- Contact:
Re: Veeam and Ceph: a real cool story
Hi @sfirmes thank you for your reply!
Uh, you wrote an interesting thing about tests! I mean you're right, if there is a tool to test the storage I can use it! I don't know how can I obtain that test, but I'm writing to my contact in Veeam asking if it's possible to test my Ceph with them.
About the case, it's the #05630213 (and it's closed right now because my colleague prefer to start from scratch in another case.... ) and it contains almost all the entire story.
Hi @Andreas Neufert I'm using Ceph 16.2.10 and the but related to the wrong date format is actually solved on my environment, I'm sure that the issue is related with something else, but thank you very much for the link!
Thank you!
Uh, you wrote an interesting thing about tests! I mean you're right, if there is a tool to test the storage I can use it! I don't know how can I obtain that test, but I'm writing to my contact in Veeam asking if it's possible to test my Ceph with them.
About the case, it's the #05630213 (and it's closed right now because my colleague prefer to start from scratch in another case.... ) and it contains almost all the entire story.
Hi @Andreas Neufert I'm using Ceph 16.2.10 and the but related to the wrong date format is actually solved on my environment, I'm sure that the issue is related with something else, but thank you very much for the link!
Thank you!
-
- Veeam Software
- Posts: 291
- Liked: 139 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: Veeam and Ceph: a real cool story
@fabio.pasetti the testing I referred to is Veeam Ready Program which is a program offered to our Technical Alliance Partners.
I am doing some testing today with v11 and v12 beta3 using the Quincy build (17.2.5) and will let you know if I see the same issue you are encountering. I only have a 4TB cluster, but that will be enough to test the deletion of millions of objects both manually and automatically.
If I have any issues, I will update this thread.
I am doing some testing today with v11 and v12 beta3 using the Quincy build (17.2.5) and will let you know if I see the same issue you are encountering. I only have a 4TB cluster, but that will be enough to test the deletion of millions of objects both manually and automatically.
If I have any issues, I will update this thread.
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- Service Provider
- Posts: 8
- Liked: 1 time
- Joined: Dec 06, 2022 8:39 am
- Full Name: Fabio
- Contact:
Re: Veeam and Ceph: a real cool story
Great, thank you very much Steve! If you need to know something else about my setup, I can send you the radosgw config or anything else.
Thank you a lot
Thank you a lot
-
- Service Provider
- Posts: 8
- Liked: 1 time
- Joined: Dec 06, 2022 8:39 am
- Full Name: Fabio
- Contact:
Re: Veeam and Ceph: a real cool story
Hi everyone,
just to update with some consideration:
- I'm looking for the errors in the radosgw debug logs and I can find that the "beast" frontend sometimes crash itself and the requests disappeared with the process
- osds didn't report errors
I suppose that my issue is related with some mods on the radosgw trying to enhance performance so, starting from today, I'm trying to move half of my radosgw to default configuration so tonight we can use them.
Thank you,
Fabio
just to update with some consideration:
- I'm looking for the errors in the radosgw debug logs and I can find that the "beast" frontend sometimes crash itself and the requests disappeared with the process
- osds didn't report errors
I suppose that my issue is related with some mods on the radosgw trying to enhance performance so, starting from today, I'm trying to move half of my radosgw to default configuration so tonight we can use them.
Thank you,
Fabio
-
- Veeam Software
- Posts: 291
- Liked: 139 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: Veeam and Ceph: a real cool story
Thanks for the update @fabio.pasetti. My setups always use the base code that I get from the linux distros and so far they have been solid.
Looking forward to your next update.
Looking forward to your next update.
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- Service Provider
- Posts: 1
- Liked: never
- Joined: Nov 27, 2022 5:07 pm
- Full Name: Randy O'Donnell
- Contact:
Re: Veeam and Ceph: a real cool story
Sorry to jump on this thread, but we are having the exact same issue with Veeam 12 (RTM), and Ceph Quincy 17.2.5. Everything seems to go fine, until it needs to clean up restore points, and throws erros on the multiple object delete. Was anything ever figured on what causes this?
Thanks,
Randy
Thanks,
Randy
-
- Product Manager
- Posts: 9821
- Liked: 2597 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Veeam and Ceph: a real cool story
Hi Randy
Welcome to the forum.
The provided Case was closed, therefore I don't see what the solution was.
Please open your own case if you want to have it analyzed.
Best,
Fabian
Welcome to the forum.
The provided Case was closed, therefore I don't see what the solution was.
Please open your own case if you want to have it analyzed.
Best,
Fabian
Product Management Analyst @ Veeam Software
Who is online
Users browsing this forum: No registered users and 15 guests