S3 performance (on site)

JRRW · May 23, 2023 1:32 pm

I'm curious what everyone's S3 performance is looking like.

We recently shifted from a small Min.IO stand alone to an all flash CEPH, and at least with Sharepoint/Teams backups, the bottleneck claims to be the target somewhat frequently. Here's a recent run where it caught up on a few days of data:

What doesn't make sense however, is this:

The 2nd image is running a WARP benchmark.

1st section:

Code: Select all

69357 operations loaded... Done!
Mixed operations.
----------------------------------------
Operation: DELETE - total: 6914, 10.0%, Concurrency: 20, Ran 4m59s, starting 2023-05-22 12:32:13.928 -0400 EDT
 * Throughput: 23.11 obj/s

Requests considered: 6915:
 * Avg: 17ms, 50%: 12ms, 90%: 24ms, 99%: 141ms, Fastest: 5ms, Slowest: 581ms, StdDev: 27ms

----------------------------------------
Operation: GET - total: 31103, 45.0%, Size: 10485760 bytes. Concurrency: 20, Ran 4m59s, starting 2023-05-22 12:32:13.837 -0400 EDT
 * Throughput: 1039.54 MiB/s, 103.95 obj/s

Requests considered: 31104:
 * Avg: 144ms, 50%: 143ms, 90%: 195ms, 99%: 275ms, Fastest: 33ms, Slowest: 1.185s, StdDev: 48ms
 * TTFB: Avg: 18ms, Best: 8ms, 25th: 15ms, Median: 17ms, 75th: 20ms, 90th: 24ms, 99th: 41ms, Worst: 1.037s StdDev: 11ms
 * First Access: Avg: 143ms, 50%: 142ms, 90%: 196ms, 99%: 300ms, Fastest: 33ms, Slowest: 1.185s, StdDev: 50ms
 * First Access TTFB: Avg: 19ms, Best: 11ms, 25th: 15ms, Median: 17ms, 75th: 20ms, 90th: 24ms, 99th: 46ms, Worst: 1.037s StdDev: 16ms
 * Last Access: Avg: 144ms, 50%: 142ms, 90%: 196ms, 99%: 323ms, Fastest: 33ms, Slowest: 477ms, StdDev: 48ms
 * Last Access TTFB: Avg: 18ms, Best: 9ms, 25th: 15ms, Median: 17ms, 75th: 20ms, 90th: 24ms, 99th: 43ms, Worst: 393ms StdDev: 10ms

----------------------------------------
Operation: PUT - total: 10368, 15.0%, Size: 10485760 bytes. Concurrency: 20, Ran 4m59s, starting 2023-05-22 12:32:13.877 -0400 EDT
 * Throughput: 346.60 MiB/s, 34.66 obj/s

Requests considered: 10369:
 * Avg: 127ms, 50%: 120ms, 90%: 152ms, 99%: 332ms, Fastest: 81ms, Slowest: 1.152s, StdDev: 46ms

----------------------------------------
Operation: STAT - total: 20766, 30.0%, Concurrency: 20, Ran 4m59s, starting 2023-05-22 12:32:13.863 -0400 EDT
 * Throughput: 69.37 obj/s

Requests considered: 20767:
 * Avg: 4ms, 50%: 2ms, 90%: 5ms, 99%: 14ms, Fastest: 1ms, Slowest: 1.022s, StdDev: 18ms
 * First Access: Avg: 3ms, 50%: 2ms, 90%: 5ms, 99%: 13ms, Fastest: 1ms, Slowest: 218ms, StdDev: 10ms
 * Last Access: Avg: 3ms, 50%: 2ms, 90%: 5ms, 99%: 14ms, Fastest: 1ms, Slowest: 218ms, StdDev: 11ms

Cluster Total: 1385.87 MiB/s, 231.06 obj/s over 4m59s.

2nd section:

Code: Select all

142677 operations loaded... Done!
Mixed operations.
----------------------------------------
Operation: DELETE - total: 14239, 10.0%, Concurrency: 40, Ran 9m58s, starting 2023-05-22 12:45:05.481 -0400 EDT
 * Throughput: 23.78 obj/s

Requests considered: 14240:
 * Avg: 19ms, 50%: 13ms, 90%: 27ms, 99%: 201ms, Fastest: 6ms, Slowest: 1.418s, StdDev: 37ms

----------------------------------------
Operation: GET - total: 64024, 45.0%, Size: 10485760 bytes. Concurrency: 40, Ran 9m59s, starting 2023-05-22 12:45:05.442 -0400 EDT
 * Throughput: 1069.54 MiB/s, 106.95 obj/s

Requests considered: 64025:
 * Avg: 321ms, 50%: 293ms, 90%: 479ms, 99%: 750ms, Fastest: 65ms, Slowest: 2.085s, StdDev: 110ms
 * TTFB: Avg: 20ms, Best: 8ms, 25th: 16ms, Median: 18ms, 75th: 21ms, 90th: 26ms, 99th: 48ms, Worst: 1.044s StdDev: 16ms
 * First Access: Avg: 321ms, 50%: 293ms, 90%: 477ms, 99%: 754ms, Fastest: 65ms, Slowest: 1.844s, StdDev: 112ms
 * First Access TTFB: Avg: 20ms, Best: 11ms, 25th: 16ms, Median: 18ms, 75th: 21ms, 90th: 26ms, 99th: 52ms, Worst: 1.032s StdDev: 18ms
 * Last Access: Avg: 322ms, 50%: 294ms, 90%: 478ms, 99%: 743ms, Fastest: 67ms, Slowest: 1.68s, StdDev: 109ms
 * Last Access TTFB: Avg: 20ms, Best: 9ms, 25th: 16ms, Median: 18ms, 75th: 21ms, 90th: 26ms, 99th: 46ms, Worst: 566ms StdDev: 13ms

----------------------------------------
Operation: PUT - total: 21343, 15.0%, Size: 10485760 bytes. Concurrency: 40, Ran 9m58s, starting 2023-05-22 12:45:05.431 -0400 EDT
 * Throughput: 356.50 MiB/s, 35.65 obj/s

Requests considered: 21344:
 * Avg: 135ms, 50%: 125ms, 90%: 162ms, 99%: 358ms, Fastest: 81ms, Slowest: 1.39s, StdDev: 53ms

----------------------------------------
Operation: STAT - total: 42707, 30.0%, Concurrency: 40, Ran 9m58s, starting 2023-05-22 12:45:05.461 -0400 EDT
 * Throughput: 71.30 obj/s

Requests considered: 42708:
 * Avg: 4ms, 50%: 2ms, 90%: 6ms, 99%: 17ms, Fastest: 1ms, Slowest: 1.008s, StdDev: 18ms
 * First Access: Avg: 5ms, 50%: 2ms, 90%: 6ms, 99%: 18ms, Fastest: 1ms, Slowest: 1.006s, StdDev: 20ms
 * Last Access: Avg: 4ms, 50%: 2ms, 90%: 6ms, 99%: 17ms, Fastest: 1ms, Slowest: 458ms, StdDev: 16ms

Cluster Total: 1425.72 MiB/s, 237.64 obj/s over 9m59s.

Am I just... Expecting more than is fair for object storage? Or is it that VBO (VBM?) isn't accurately identifying the actual bottleneck, and it's all Microsoft that is the bottleneck.

Post by **Mildur** » May 31, 2023 5:53 pm this post

Hi Ryan

If you want to check the bottleneck, I suggest to open a support case.
Our support team is experienced with analyzing the log files for any bottleneck, source or target.

VBO (VBM?)

We use Veeam Backup for Microsoft 365 on all official channels. But if we need the short form, we call it VB365.

Best,
Fabian

Jun 10, 2023 12:35 pm

Questions:
How many objects and TBs are in the bucket(s) you are using for production? If you are already in the millions of objects per bucket, ceph can’t handle it well (with the performance that is needed).

General information
We have many big customer with object storage for M365 backup with onprem solutions (or using our or other cloud solutions) thate are happy and get good performance. "Object storage" is like saying in general "block storage": There are multiple vendors, systems and ways how to build solution.

We have seen and tested many systems (used by our customers for Veeam) and we have evaluated many other systems too. Ceph is in the “don’t use for Veeam” section for multiple reasons.

In short
Ceph is not the best option for Veeam (VB365 and VBR). Responsible for this are the very small object and the total size of objects per bucket that Veeam generates. The technical (background) problem is the Ceph metadata handling. It doesn't matter if allflash or not for the data (the metadata should always be on flash).

More info from the field
We have (in our country) a well known customer that has 20+ years of experience with opensource and 7+ years with CEPH (PBs of Data) with a very high skilled team. Even this customer decided to get another object storage for Veeam, because it was not possible to achieve the performance and get rid of the problems with Ceph (for Veeam with this millions of objects per bucket).

Your test values
Your test already shows the limitation of your setup. I don’t see the parameters you used. But from the throughput and the objects / seconds I would guess, you are using way bigger objects for the test compared what Veeam is using. This explains the different throughput. Ceph is very good and handling bigger objects.

GET 1039.54 MiB/s, 103.95 obj/s = ca. 10MiB object size

PUT 346.60 MiB/s, 34.66 obj/s = ca. 10MiB object size

And the value “delete operations per seconds” already shows that your system will probably have a hard time handling metadata. This is exactly what we see in the field with bulk deletions on Ceph (from Veeam).

If Ceph is set and you must use it….
There is at least one possibility on how to try to improve the performance for the metadata handling temporary - but it has the chance to make it even worse. Our customer tried it with multiple shards and in the end, it was a huge overhead and the performance didn’t really improve, Ceph was unstable and there were more problems than benefits. To be fair: this was 3 years ago. But attention: If you change the shards and you already have too many objects, this operation can take a very long time (days). Therefore: If you are still in a range that your system can handle, you can try this (at your own risk!), but make sure you now the limitations, have the experience and knowhow to operate Ceph.

Short google research: IBM states a maximum of 100k Objects per Index shard
https://www.ibm.com/docs/en/storage-cep ... cket-index

Personal opinion and consideration
If you are not happy with your actual performance, evaluate another system from a vendor that can proof that this use case was done multiple times with this TB and x-MIO objects per bucket without any problems and with the desired performance. If you need some advice of vendors we have tested, just write me a direct message.

Disclaimer
This is all with the focus on Veeam and doesn’t mean to be a writing against Ceph or that Ceph is not a good solution for other use-cases. It just doesn’t fit the strategy on how Veeam uses object storage. And to be fair: There are many more vendors / systems on the market that have a very hard time with so many small objects. And the other way around: there are only a few out there that can handle it very well when it comes to scale.

Jun 11, 2023 7:53 pm

So.. can you drop some names of those that seem to be able “handle it very well” in your experience? Quite valuable to all of us to know where things go well and why, in the same way it’s very good to know where and why things go bad (as you explained regarding ceph).

Post by **bstuiver** » Jun 12, 2023 4:56 am this post

Hi William,
Thanks for your usefull contribution on this matter.
We also have this problem with our on premise object storage cluster. Can you please elaborate on which vendors do a good job with the usecase Veeam ?

TIA for your response
Bertus

Post by **EWMarco** » Jun 12, 2023 7:15 am this post

Disclaimer: This goes for VBR but I'm kinda not expecting M365 codebase to be all that different:

Well, I can tell you that neither NetApp StorageGrid nor, seemingly, Cloudian would fall into the category of "handling it well".

We have 3.3 billion objects in our StorageGrid, all Veeam, and delete operations are atrocious. We're in cleanup mode right now, having completely given up on doing offloading in our largest environment. We manage about 1TB per hour of deletes however we tweak settings. At a fill state of 4PB you can imagine that that is kinda underwhelming. Cleanup jobs have been running for a literal week with no sign of stopping

We've gotten together with other somewhat large customers of Veeam's lately and have been told that buckets beyond 250TB will have issues with deletion and enumeration across different S3 vendors. Funny enough, the option to use S3 as anything other than SOBR capacity tier or having more than one, as far as I have been told, has not existed prior to v12.

So in short, my verdict on the matter is simple: Veeam is very new to S3 and while they got it working in principle, I think they had, and are still having, huge issues with matters of scaling. The fact that nobody will give you a straight answer in terms of sizing and limits leads me to believe that they either don't know their limits or they also might be embarrassingly low.

We are currently trying to switch over to S3 only SOBRs with multiple, small (compared to our 660TB ones) buckets and using copy jobs instead of tier offloading as I think the computational process of offloading has many weaknesses currently.

The concept, I think, has been built on the expectation of using public cloud S3 storages that are built to very different scales than in-house solutions.

Post by **lasseoe** » Jun 12, 2023 8:41 am this post

@ewmarco S3-only SOBR, in terms of storage, how are your object storage nodes configured?

The key to making object storage work (not Ceph, as it seems) is to have as many spindes and nodes as possible, and each node will have direct access to each spindle. MinIO can scale like crazy this way.
Has anyone here built very large MinIO clusters for VBR and/or VBO365? I'm curious about the exact hardware setup.

With the current cost (although dropping rapidly) of electricity, on-prem scale-out object storage clusters aren't necessarily cost effective due to each node having CPUs, memory and so on, but they do give you an interesting scale-out business case.

Post by **JaySt** » Jun 12, 2023 8:52 am this post

it all depends on how the S3 solution handles things. It's a complex situation where Veeam still needs to come back with some experience from the field. For example, the S3 API handling has been changed & optimized by Veeam in v12, but S3 solutions need to adjust on that as well to have affect.
Object Storage can scale pretty well, they've been arround a long time. The Veeam use case is difficult it seems though.

Looking forward to see how Minio holds up, as they have a meta-data handling that's quite performance-optimized (stored with the object, atomically, instead of centralized) , but i'm not sure how it scales in the hundreds of terabytes when used by SOBR or VBO365.

BTW, i'd be looking for some confirmation about the difference between how VBR and VBO365 write to Object Based storage. I suspect those are different beasts in terms of IO patterns, sizes, operations etc. So scaling could differ between the two.

Jun 12, 2023 10:44 am

EWMarco wrote: ↑Jun 12, 2023 7:15 amVeeam is very new to S3 and while they got it working in principle, I think they had, and are still having, huge issues with matters of scaling.

Not really all that new to S3, celebrating 5 years already. Also, it is important to realize that issues with scaling are not in Veeam, but rather on the object storage size. The truth is, most object storage was simply not ready for the backup use case (in terms of the amount of data that needs to be stored). This is improving though.

EWMarco wrote: ↑Jun 12, 2023 7:15 amThe fact that nobody will give you a straight answer in terms of sizing and limits leads me to believe that they either don't know their limits or they also might be embarrassingly low.

Actually, I'm seeing more and more vendors document this, and not being shy about this even when their numbers are low. Probably due to having to do moneybacks on more than one occasion

but this sort of documentation is definitely something to look for when choosing yourself an on-prem object storage!

I can give some recommendations which I personally am confident with:

From on-prem storage perspective, Dell ECS never really had scaling issues with Veeam, not even with V1 of our S3 support. ECS quickly became the top enterprise object storage according to our support big data, with like 9 out of 10 biggest object storage repositories hosted on one at some point (as of a couple of years ago, when I last blogged about this).

From more recent on-prem object storage players, I heard good things about Pure Storage FlashBlade and Object First: specifically, our Performance & Scalability Testing QA team have chosen them for the V12 lab (simply because they were the fastest, which is most important for their tests).

From cloud object storage perspective, we do not really encounter scalability issues in our Support, at least with top three vendors used by Veeam customers (Amazon, Wasabi and Azure). I'm literally only aware of a single issue from all these year that made it to these forums: V11 with multiple large SOBRs all having their Capacity Tier pointing to the same single Amazon S3 bucket (not really a best practice). But even there, issues with delete operations slowing down appeared only as the bucket started approaching 1PB in size. By the way, V12 will likely not have issues in the same scenario due to an updated object storage format that apparently helps Amazon S3 to transparently "chunk" huge buckets under the hood (if I remember the explanation from Amazon engineers correctly).

It's true there're also plenty of object storage offerings that simply were not designed as object storage from ground-up - but are rather more or less S3 protocol bolted on top of a legacy storage architecture. And they did struggle massively in early days with Veeam, but they have also been investing heavily into improving their scalability over the recent years. I don't know if those were incremental improvements or significant architectural changes, but I saw official bucket size recommendations increasing steadily for some vendors.

The other good news is that V12 supports SOBR made of multiple object storage buckets, so even if your object storage has some crazy limits like 50TB per bucket (which I think is the absolute worst I saw in terms of official system requirements from a storage vendor to date), you can nevertheless still use it with Veeam now without any issues by just cutting multiple buckets and joining them in SOBR. Because while it's not uncommon for object storage to struggle with large buckets, I'm yet to see an object storage that does not "like" a large number of smaller buckets. This is presumably because each bucket usually gets its own, dedicated index database.

Post by **EWMarco** » Jun 12, 2023 11:01 am this post

@Gostev: We are currently in the process of switching to multiple buckets as performance tier SOBRs and then using copy jobs instead of relying on offloading. We do this because of feedback we have from other companies our size who struggled with similar issues.

Redesigning the whole setup will take us months so I cannot give any feedback as to whether this will make things better once the full load is running on this. What I lament is that the customer has to make these experiences basically on his own (and carry the risk), even though Veeam and for example Netapp are partners.

alec-at-work · Jun 12, 2023 11:11 am

JaySt wrote: ↑Jun 12, 2023 8:52 am BTW, i'd be looking for some confirmation about the difference between how VBR and VBO365 write to Object Based storage. I suspect those are different beasts in terms of IO patterns, sizes, operations etc. So scaling could differ between the two.

I don't have any data recorded in detail but we see VB365 is a lot heavier on LIST and HEAD requests than VBR on S3. Certainly as a proportion of metadata operations to writes.

I agree with Gostev's point around 'legacy' vendors having issues around Veeam's workload, or rather backup workloads in general. On a mature Veeam bucket that has populated its full backup chain, we see almost equal write operations to deletes. Obvious, right? You write the newest restore point and then delete the oldest. But this is a very unusual behaviour for a storage technology that has been designed around another use case and then is pivoted to be marketed at Veeam customers. It's something that is key to investigate with any potential S3 system, the deletion performance is as important as the write performance.

Edit to add: by deletion performance, I should clarify that it's end-to-end for the full deletion process. One of our S3 systems caches deletion requests in the metadata database before processing them to disk. So the client performance is good, Veeam is happy but the cluster still needs to delete this data off disk and the space doesn't become 'free' until that's completed.

Jun 12, 2023 11:55 am

EWMarco wrote: ↑Jun 12, 2023 11:01 am@Gostev: We are currently in the process of switching to multiple buckets as performance tier SOBRs and then using copy jobs instead of relying on offloading. We do this because of feedback we have from other companies our size who struggled with similar issues.

Off-topic for this thread, however I would definitely recommend against changing from SOBR Capacity Tier offloading to backup copy in larger environments. Remember that Capacity Tier also supports multiple buckets in V12, which solves the discussed single bucket scalability issue. But it is also important to realize that you're not gaining any added reliability with our object storage format being the same regardless whether it is SOBR offload or backup copy, however you WILL lose added functionality and restore performance optimizations that SOBR brings, while also adding management and resource usage overheads of additional copy jobs.

JRRW · Post by **JRRW** » Jun 12, 2023 9:07 pm this post

DE&C wrote: ↑Jun 10, 2023 12:35 pm Questions:
How many objects and TBs are in the bucket(s) you are using for production? If you are already in the millions of objects per bucket, ceph can’t handle it well (with the performance that is needed).

General information
We have many big customer with object storage for M365 backup with onprem solutions (or using our or other cloud solutions) thate are happy and get good performance. "Object storage" is like saying in general "block storage": There are multiple vendors, systems and ways how to build solution.

We have seen and tested many systems (used by our customers for Veeam) and we have evaluated many other systems too. Ceph is in the “don’t use for Veeam” section for multiple reasons.

In short
Ceph is not the best option for Veeam (VB365 and VBR). Responsible for this are the very small object and the total size of objects per bucket that Veeam generates. The technical (background) problem is the Ceph metadata handling. It doesn't matter if allflash or not for the data (the metadata should always be on flash).

More info from the field
We have (in our country) a well known customer that has 20+ years of experience with opensource and 7+ years with CEPH (PBs of Data) with a very high skilled team. Even this customer decided to get another object storage for Veeam, because it was not possible to achieve the performance and get rid of the problems with Ceph (for Veeam with this millions of objects per bucket).

Your test values
Your test already shows the limitation of your setup. I don’t see the parameters you used. But from the throughput and the objects / seconds I would guess, you are using way bigger objects for the test compared what Veeam is using. This explains the different throughput. Ceph is very good and handling bigger objects.

GET 1039.54 MiB/s, 103.95 obj/s = ca. 10MiB object size

PUT 346.60 MiB/s, 34.66 obj/s = ca. 10MiB object size
And the value “delete operations per seconds” already shows that your system will probably have a hard time handling metadata. This is exactly what we see in the field with bulk deletions on Ceph (from Veeam).

If Ceph is set and you must use it….
There is at least one possibility on how to try to improve the performance for the metadata handling temporary - but it has the chance to make it even worse. Our customer tried it with multiple shards and in the end, it was a huge overhead and the performance didn’t really improve, Ceph was unstable and there were more problems than benefits. To be fair: this was 3 years ago. But attention: If you change the shards and you already have too many objects, this operation can take a very long time (days). Therefore: If you are still in a range that your system can handle, you can try this (at your own risk!), but make sure you now the limitations, have the experience and knowhow to operate Ceph.

Short google research: IBM states a maximum of 100k Objects per Index shard
https://www.ibm.com/docs/en/storage-cep ... cket-index

Personal opinion and consideration
If you are not happy with your actual performance, evaluate another system from a vendor that can proof that this use case was done multiple times with this TB and x-MIO objects per bucket without any problems and with the desired performance. If you need some advice of vendors we have tested, just write me a direct message.

Disclaimer
This is all with the focus on Veeam and doesn’t mean to be a writing against Ceph or that Ceph is not a good solution for other use-cases. It just doesn’t fit the strategy on how Veeam uses object storage. And to be fair: There are many more vendors / systems on the market that have a very hard time with so many small objects. And the other way around: there are only a few out there that can handle it very well when it comes to scale.

That's some fascinating information, thank you.

I'm particularly interested in the fact you're using this at such a high scale; I know that CEPH is being used by users of Veeam in the multi-PB range using XFS/Block (not S3 that I know of) and it works well.

JaySt wrote: ↑Jun 11, 2023 7:53 pm So.. can you drop some names of those that seem to be able “handle it very well” in your experience? Quite valuable to all of us to know where things go well and why, in the same way it’s very good to know where and why things go bad (as you explained regarding ceph).

On CEPH
For what it's worth, Ceph with XFS on the gateways it is doing GREAT for traditional VBR. Far less expensive than other vendors for an all flash ( for over 160TB ) setup, with a lot of advantages on the resiliency side of the equation. Though I do wish we could target two gateways within Veeam for a single repository, but that's as much a CEPH limitation as Veeam.

For VBO, even if it's 'slower' I still wouldn't go back to file level for anything. If for no other reason than you don't 'get back' the storage you utilize in the JET databases with retention, S3 is simply the smarter solution, and I applaud Veeam for their work on it.

While I didn't attend Cephalocon I will be at USENIX which has a session on it, so I'll see if I can't poll some users and admins on this topic. I'm not thrilled with the IBM takeover, but it still seems a very performant solution.

lasseoe wrote: ↑Jun 12, 2023 8:41 am @ewmarco S3-only SOBR, in terms of storage, how are your object storage nodes configured?

The key to making object storage work (not Ceph, as it seems) is to have as many spindes and nodes as possible, and each node will have direct access to each spindle. MinIO can scale like crazy this way.
Has anyone here built very large MinIO clusters for VBR and/or VBO365? I'm curious about the exact hardware setup.

With the current cost (although dropping rapidly) of electricity, on-prem scale-out object storage clusters aren't necessarily cost effective due to each node having CPUs, memory and so on, but they do give you an interesting scale-out business case.

I ran MinIO as a single VM on a traditional RAID-50, and while not as performant as CEPH on S3, it handled 30TB without issue, and ran better than JET Databases on the same storage

CEPH is the same way with spindles and nodes, though in my design I only run 60 OSDs (Micron 5300 7.68TB each) and don't even use NVMe as cache - and it runs both VBR XFS and S3 for VBM/VBO - which was my main selling point to enable one storage for two different file/storage types.

I also want to point out that I was getting MUCH better performance with CEPH S3 using an S3 benchmark app than VBM was getting on the same exact system... So to my origional point, either Veeam's coding isn't as good, or (and frankly I still wonder if this is more the case) if Microsoft is just sucking at life when it comes to pulling that much data from M365

Gostev wrote: ↑Jun 12, 2023 10:44 am I can give some recommendations which I personally am confident with:

From on-prem storage perspective, Dell ECS never really had scaling issues with Veeam, not even with V1 of our S3 support. ECS quickly became the top enterprise object storage according to our support big data, with like 9 out of 10 biggest object storage repositories hosted on one at some point (as of a couple of years ago, when I last blogged about this).

From more recent on-prem object storage players, I heard good things about Pure Storage FlashBlade and Object First: specifically, our Performance & Scalability Testing QA team have chosen them for the V12 lab (simply because they were the fastest, which is most important for their tests).

@Gostev as always, thank you for your transparency and more importantly your interaction with the community.
We actually use Pure Flasharray and I gave serious consideration to running a virtualized or even FC Min.IO cluster on it, presenting a bunch of 'small' LUNs as 'disks' to Min.IO as I really like their product; their support gets a little spendy though.

The Flashblade//E might be economical at larger scale, but the //S at least is pretty costly without even offering Dedupe (vs something like VAST Data) - ECS was on our radar as well as HCP. That having been said, for MOST of the M365 backup side (We ARE talking VBO/VBM here) deduplication won't gain you a lot of savings generally speaking...Now, for VBR that I think is something else entirely.

On VBR which as Gostev pointed out wasn't exactly what this post was meant for (though I totally appreciate that there aren't that many of us using S3 at any Enterprise scale on any Veeam product)
I don't - and am not sure we will ever - intend on using S3 for archiving, unless we ditch tapes. At that point, we might go the route of something like Vast to leverage the global deduplication, as opposed to simply storing raw data in the PBs. Our VBR isn't as large as some, but at over 100TB for a 'Full' (we write around 70TB to tape weekly) and keeping the amount of tapes as we do, we have easily 4PB in tape that would be nuts in cost of drives/nodes/power to hold in an S3 archive of that size + a replica for off-site and lagged copy purposes (this is more a hypothesis I'll admit).

Thank you everyone for a great dialog on this, it's good to have open and informed conversations. Keeping it to facts and general experiences rather than attacking any one product or element is helpful.

I might - now that we are on V12 - try a direct to S3 or even test an archive tier on CEPH simply to see the performance. I have space, more or less.

Jun 13, 2023 3:22 pm

When designing a ceph cluster for Veeam it is really important to have the bucket index pool rgw.bucket.index as all-flash storage. When a bucket fills to a point where it needs to increase the size of the bucket's index database it must reshard and that's slow unless you've applied a replicated_ssd rule to it. It's also good to preshard the bucket index database to 100M objects. Last, NVMe configured as the write log (WAL) is also very important for good write performance as the latency with HDD OSDs is high unless you've offloaded the WAL/MDB to flash.

Without knowing the hardware build of materials and the OSD/WAL/MDB CRUSH rule configuration it's hard to make recommendations. If you can, please post that information.

We have a design tool you may find helpful for sizing the right amount of flash storage for your s3/rgw ceph cluster. Note that the performance estimates the tool gives is based on optimal 64M+ object sizes.. ymmv but it'll help you size the right amount of flash for your cluster.

https://www.osnexus.com/ceph-designer

Post by **karsten123** » Jun 13, 2023 6:00 pm this post

What are your experiences with Scality Artesca?

JRRW · Jul 28, 2023 8:34 pm

osnexus wrote: ↑Jun 13, 2023 3:22 pm When designing a ceph cluster for Veeam it is really important to have the bucket index pool rgw.bucket.index as all-flash storage. When a bucket fills to a point where it needs to increase the size of the bucket's index database it must reshard and that's slow unless you've applied a replicated_ssd rule to it. It's also good to preshard the bucket index database to 100M objects. Last, NVMe configured as the write log (WAL) is also very important for good write performance as the latency with HDD OSDs is high unless you've offloaded the WAL/MDB to flash.

Without knowing the hardware build of materials and the OSD/WAL/MDB CRUSH rule configuration it's hard to make recommendations. If you can, please post that information.

We have a design tool you may find helpful for sizing the right amount of flash storage for your s3/rgw ceph cluster. Note that the performance estimates the tool gives is based on optimal 64M+ object sizes.. ymmv but it'll help you size the right amount of flash for your cluster.

https://www.osnexus.com/ceph-designer

Sorry for the delay, the last two months have been crazy busy.

It's an all flash cluster. Only real 'issue' i have with the build is that it was built with 8x32gb on memory instead of 12 - 2nd gen is 6 channel memory so 8 is uneven on a single proc for bandwidth.
Gateways: GW1 & GW2 - 1xSilver 4216 (16c/32t) w/64gb memory, 2xIntel SFP28 for public bond
--2xSATA drives for OS
4 identical nodes: 1xGold 6230R (26c/52t) 256gb Memory 2xIntel SFP28 for public 2xintel SFP28 for private -- Each node has 15 7.68TB Micron 5300 Pro for a total of 60x7.68 with a 4:2 EC

The gateways host via docker the S3, and it does go between them. The pool for the rgw is 256 PGs and runs on CEPH v15.2.17

It's not 'bad' exactly, just not as performant in comparison to our VBR that goes to the RBD Pool on the same storage (presents 4x50TB connections to GW1 which then does a raid-0 using LVM so that CEPH isn't doing a single thread), but we're toying with the idea of using S3 archive for VBR to get rid of tape, and one option would be to use CEPH and adding NL-SAS nodes for 'cheap and deep', vs doing something 'turn key' like an Isilon.

sandsturm · Post by **sandsturm** » Aug 01, 2023 8:20 am this post

Hi all

Thanks a lot for your interesting posts in this thread!
I hope my post is right in this thread, because I have related questions to performance of onpremise objectstorage systems:
I'm planning to switch our current VBM365 local disk repositories to Objectstorage repositories. I have two proxies with a repo disk each due to the fact that we have about 5k users (and increasing). My first question is now: are the proposed maximum configurations the same if using objectstorage? Meaning, that I still have to create two buckets or at least two different folders (one folder per repository) in the same bucket. Each proxy server has then configured the local disk cache for one of these S3-repositories. Or can I do all on one single proxy server? I could not really find an answer to this question in veeam best practice guide if using objectstorage as backuptarget.

Second question: I try to use the script from https://www.veeam.com/kb3067 to migrate backupdata from local repos to the objectstorage system. I started that for the backupjob with onedrive backups for 4600 users with a total of 1.9TB of backupdata. The script is able to migrate about 20-30 users per hour and therefore I need ages to migrate all data. it seems to me very slow, does anyone has experience with this script? (we are using three Netapp StorageGrid SG6060 nodes per site (two sites) as our onpremise S3 storagesystem)
I was thinking about attaching the new Objectstorage repos to the existing jobs and let the old backups on the old repos for the time of the retention and delete them afterwards as an alternative to the data migration. But would I every be able to restore data from such an "unmapped" old repo? Is there a way to do that if the backupjobs have attached the new repos?

thx,
sandsturm

JRRW · Aug 01, 2023 1:03 pm

@Sandsturm I haven't used the script as we did more the 2nd route: The difference being that I just cloned my jobs, and left the 'old' in place disabled for restore purposes for the retention period. It's less ideal but yes, when you're migrating TBs of data, it's not the fastest.

I also know that you don't get as good 'space saving' compression when migrating vs straight from M365 to S3 - which is another reason why I didn't move my retention. Mind you, I have a magnitude of scale larger as we're currently at roughly 150TB on 'disk' VBO... So yeah... Wouldn't want to deal with that migration, ever.

Honestly as to the design, Object Storage is vastly more flexible when it comes to your buckets. There are limits (vendor specific generally) but I'd say you're better view on it would be for organization (i.e. if you have certain business units you want in their own buckets for some reason or simple tracking purposes) or for different retention reasons.

sandsturm · Post by **sandsturm** » Aug 02, 2023 6:00 am this post

@JRRW: Thanks for your answer, cloning the existing jobs could be an option, yes. Because the migration is really far to slow!
Can anyone confirm that data cannot be restored, if the repository was detached/unconfigured from a backupjob? In this case, i don't need to clone all backupjobs.... Is there a way to reattach an existing repo to a backupjob in case a restore is required in the future?

JRRW · Post by **JRRW** » Aug 02, 2023 2:24 pm this post

Supposedly - I've read this in a post elsewhere on the forum - Veeam can import existing data from an S3 repository that it created if the Organization is the same. It seemed convoluted to me, so I'd personally play it safer by having disabled jobs attached to the 'old' repository.

Probably best to open a support case to validate the procedure for re-attaching prior to making that move.

Aug 05, 2023 8:15 pm

Interesting thread.
As the original question was about vb365 I would like to give my 5 cents to it.

Am I just... Expecting more than is fair for object storage? Or is it that VBO (VBM?) isn't accurately identifying the actual bottleneck, and it's all Microsoft that is the bottleneck.

Yes, from my experience, i would say it is always MS which is the bottleneck. At least if you have configured/sized everything properly (split jobs, every job has it own repo, properly sized proxies etc.)
If you see something else than source as a bottleneck in the GUI than check the logs. For us it mostly says MS throttling no matter what you see in the GUI.
Our vb365 backup data is in the PB`s now. Hundreds of concurring jobs. Currently we are using 2 Cloudian Clusters for vb365. We never had any issues or performance problems here. Actually, all Object Storage I used with vb365 was good enough.
In the future we will use all-flash ceph for vb365 as well.

We are also offering our clusters to our customers for offloading their vbr backups. Here we have even more backup data. In most cases with vbr Cloudian is used but we also have some ceph for this. With vbr, I can confirm, that you will probably get some (or many) issues over time. Especially, when you have no influence how your storage is used. Default job config has 1 MB object size which is good for saving capacity but bad when it comes to performance. With this we had many really troubling problems in the past. With Cloudian and also with ceph. It could become a mess when vbr deletes millions of objects at once. But we had also veeam S3 problems (e.g. broken indexes or something like that). Often those cases are very time-consuming and it can take months to solve them.
You should try to use larger object sizes and small buckets. Use more buckets instead.
Large buckets with small objects are usually problematic (with all vendors at some point).

But with vb365 we usually see much larger objects than with vbr. Therefore, I do not consider small objects an issue with vb365 (but with vbr). When you follow vb365 bp guide than you also do not get very large buckets (maybe in some cases). Our largest Exchange bucket is around 25 TB (with a max recommended job/repo configuration of 5000 users and 1 year retention). OneDrive Buckets can grow larger with 5000 users. Our largest one is 80 TB with 1 year retention. SP/Teams buckets are usually much smaller with a bp configuration of 5000 objects per job/repo. Our largest here is 9 TB but most SP/Teams repos are between 1 and 3 TB
Here you can see a somewhat average object size distribution of one of my customers Mail/OD/SP/Teams:
https://pasteboard.co/bDPQVbPC7R1b.png

In mail repos you will find many small objects. But it is still not an issue. because buckets do not grow (too) large and you do not delete many objects per day/at once.
Largest repos are usually OneDrive repos. You will have extra large objects here. Quite good for performance.

Here you can see vbr object sizes:
https://pasteboard.co/0Z4HE4VL6VS7.png

This is a huge difference compared with vb365. Another difference is usually repo size. Because my customers configure their veeam instance by themselves I often see (too) large buckets.
With vbr you can (and sometimes should) use other options than Object storage. But with vb365 you should use Object Storage because of data reduction ratio (up to 50%). Actually, you can use whatever Object Storage you want but should avoid using Azure Blob Storage with vb365. It will work perfectly fine but your backup is in the same cloud as your source is. You will find better options.

Aug 15, 2023 8:07 am

The VBR number looks like a bit off with the very small objects. Was versioning enabled in the bucket without usage of immutability on the Veeam side? My guess is that a lot of deletion stub files are sitting in the bucket and need to get cleaned up.

Feb 02, 2024 10:43 am

karsten123 wrote: ↑Jun 13, 2023 6:00 pm What are your experiences with Scality Artesca?

I can fully recommend it. Since last year we implement Artesca and it works perfectly for small and very big customers. I you have any questions about it, feel free to ask.

Disclaimer: I work for the VASP in Switzerland and we also sell Artesca (just to be transparent). We evaluated many solutions (all that were mentioned in this thread) in the last years for Veeam.

Post by **tyler.jurgens** » Feb 05, 2024 4:07 pm this post

Good point Andreas - Very possibly a bunch of stale delete markers. Easy enough to find out - I used an app called "S3 Browser" which can show delete markers under the 'Version' tab. Also, if you want to clean them up, here's a post: https://explosive.cloud/minio-and-veeam ... e-markers/

FredNass · Post by **FredNass** » Nov 15, 2024 11:54 am this post

DE&C wrote: ↑Jun 10, 2023 12:35 pm Questions:
How many objects and TBs are in the bucket(s) you are using for production? If you are already in the millions of objects per bucket, ceph can’t handle it well (with the performance that is needed).

This statement must be the reflection of a bad experience. Ceph S3 can handle tens and hundreds of millions of objects in the very same S3 bucket without any issues.

DE&C wrote: ↑Jun 10, 2023 12:35 pm General information
We have many big customer with object storage for M365 backup with onprem solutions (or using our or other cloud solutions) thate are happy and get good performance. "Object storage" is like saying in general "block storage": There are multiple vendors, systems and ways how to build solution.

We have seen and tested many systems (used by our customers for Veeam) and we have evaluated many other systems too. Ceph is in the “don’t use for Veeam” section for multiple reasons.

In short
Ceph is not the best option for Veeam (VB365 and VBR). Responsible for this are the very small object and the total size of objects per bucket that Veeam generates. The technical (background) problem is the Ceph metadata handling. It doesn't matter if allflash or not for the data (the metadata should always be on flash).

I disagree with Ceph S3 metadata handling being an issue. Within Ceph, bucket's metadata is automatically split over multiple RADOS objects and the scalability is ensured by the fact that RGW S3 gateways automatically reshard bucket indexes when buckets start to grow to maintain the highest level of performance. Buckets can even be prepared when it's a known fact that they're gonna store tens or hundreds of millions of objects.

There are multiple examples out here of messaging system storing very small objects in Ceph S3 storage for tens of millions mailboxes at cloud providers, universities and research centers.

DE&C wrote: ↑Jun 10, 2023 12:35 pm More info from the field
We have (in our country) a well known customer that has 20+ years of experience with opensource and 7+ years with CEPH (PBs of Data) with a very high skilled team. Even this customer decided to get another object storage for Veeam, because it was not possible to achieve the performance and get rid of the problems with Ceph (for Veeam with this millions of objects per bucket).

I would be very happy to discuss this point with him about the difficulties he encountered.

DE&C wrote: ↑Jun 10, 2023 12:35 pm
Your test values
Your test already shows the limitation of your setup. I don’t see the parameters you used. But from the throughput and the objects / seconds I would guess, you are using way bigger objects for the test compared what Veeam is using. This explains the different throughput. Ceph is very good and handling bigger objects.

GET 1039.54 MiB/s, 103.95 obj/s = ca. 10MiB object size

PUT 346.60 MiB/s, 34.66 obj/s = ca. 10MiB object size
And the value “delete operations per seconds” already shows that your system will probably have a hard time handling metadata. This is exactly what we see in the field with bulk deletions on Ceph (from Veeam).

If Ceph is set and you must use it….
There is at least one possibility on how to try to improve the performance for the metadata handling temporary - but it has the chance to make it even worse. Our customer tried it with multiple shards and in the end, it was a huge overhead and the performance didn’t really improve, Ceph was unstable and there were more problems than benefits. To be fair: this was 3 years ago. But attention: If you change the shards and you already have too many objects, this operation can take a very long time (days). Therefore: If you are still in a range that your system can handle, you can try this (at your own risk!), but make sure you now the limitations, have the experience and knowhow to operate Ceph.

Short google research: IBM states a maximum of 100k Objects per Index shard
https://www.ibm.com/docs/en/storage-cep ... cket-index

This is quite vague and ignores dynamic resharding.

DE&C wrote: ↑Jun 10, 2023 12:35 pm Personal opinion and consideration
If you are not happy with your actual performance, evaluate another system from a vendor that can proof that this use case was done multiple times with this TB and x-MIO objects per bucket without any problems and with the desired performance. If you need some advice of vendors we have tested, just write me a direct message.

Disclaimer
This is all with the focus on Veeam and doesn’t mean to be a writing against Ceph or that Ceph is not a good solution for other use-cases. It just doesn’t fit the strategy on how Veeam uses object storage. And to be fair: There are many more vendors / systems on the market that have a very hard time with so many small objects. And the other way around: there are only a few out there that can handle it very well when it comes to scale.

Ceph is a great solution, including for use cases with millions of small objects. One just need to know how to properly desing it and configure it. We have been using Ceph for 10+ years for different workloads, including email, without any difficulties. There are dozens of examples on the Internet showing that Ceph is perfectly capable of performing well with very small IO operations and buckets containing tens or even hundreds of millions of objects, thanks precisely to its design and scalability.

BTW, I've been using Veeam in the past. Great solution, probably the best backup solution actually. There was no S3 storage by that time and since I'm not using it anymore, I did not evaluate how Veeam works when storing backup data in S3 storage, but maybe I'll get the chance to do so in the future.

Cheers,
Frédéric.

Nov 15, 2024 1:20 pm

I have analyzed how Veeam, in the case of Office 365, uses the S3 Compatible API, which is the same one used by Ceph and Minio.

And it’s not the best way to use it—you can verify this, and I believe it could be improved.

Both Ceph and Minio lack a system we’ll call deferred deletes, which means any DELETE request is processed instantly, blocking subsequent IOPS and increasing I/O wait. There are very effective methods to send multiple objects for deletion simultaneously without generating one request per object, thereby avoiding resource overuse. Additionally, these "deletes" could be scheduled for a time other than when new backup data is being uploaded. This way, the storage system would only need to handle HEAD and PUT requests, rather than HEAD, PUT, and DELETE.

Moreover, the average object size uploaded by VBO is tiny. In contrast, with VBR, we can configure object sizes to 4MB or 8MB, which improves performance. If performance is to be enhanced, Veeam should "invent" something like a synthetic block for VBO to store multiple files or parts of files together.

Minio+ Micron Object Storage Optimial size: https://jp.microncpg.com/content/dam/mi ... _brief.pdf

That said, Veeam should improve how it handles request logic. Not all issues stem from the backend. Furthermore, we have verified through metrics that when Veeam identifies a Bottleneck Target and we analyze the clusters, we see 0.0 IO Wait. This indicates that the issue is not the storage but rather the program's logic—possibly a throttle.

It’s also worth noting that Minio, and I believe Ceph as well, use Direct IO and do not utilize the intermediate caches of operating systems. This can be modified with commands to achieve better performance.

We operate both Ceph and Minio environments. Personally, I find Minio superior to Ceph in terms of administration time. However, Ceph is better for us when implementing GEO-redundancies with disk maps.

With a Minio cluster, we have successfully achieved 14Gbps while migrating data from Wasabi to Minio using rclone and 1,000 processing threads. However, we have not been able to achieve the same with VBO, even though the proxies are capable of running 960 tasks simultaneously.

We should also add that, by default, Veeam purges old data from repositories at 00:00, and if this coincides with your backups, you’ll have quite the situation on your hands.

Jan 20, 2025 7:31 pm

Just wanted to add some general tips on getting great performance with Veeam using Ceph.
1) make sure that your bucket.index pool is all-flash
2) make sure you have WAL+MDB offload to flash storage, generally recommend 2GB write log and 30GB per 1TB of HDD space for MDB offload
3) intelligent data placement is important, you want small objects < 32K in replica=3 and larger objects written to EC storage. Use replica=3 storage for your small objects (STANDARD class), use EC storage (STANDARD_IA and use a minimum m=2 coding blocks, 3 recommended) for your data. We have a free Community Edition you can use to try this strategy out and a video here on how to set it up. https://youtu.be/TDlRqehpMUs?feature=shared&t=1126 covers the intelligent data placement. For help with sizing a cluster and assistance setting it up just grab Trial Edition keys (osnexus.com/downloads).

JRRW · Post by **JRRW** » Feb 14, 2025 2:24 pm this post

Thanks osnexus!

In my case, it's all flash, so there's no real... specific element that applies to my use case at least; also when you're talking CEPH are you talking S3 specifically, or in general?

My VBR backs up to an XFS file system running on one of my 2 Rados gateways, in a large LVM raid-0, with 8 RBDs split between 4 nodes; the nodes have a total of 128 7TB enterprise sata-ssd's

Post by **osnexus** » Feb 17, 2025 2:55 am this post

Am referring to using the S3 protocol via CephRGW instances.. it should be faster than using a RBD with XFS on top.

R&D Forums

S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Re: S3 performance (on site)

Who is online