[Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Post by **DE&C** » Apr 18, 2020 1:14 pm this post

We do a lot of project with Veeam and object storage. We think object storage is the way to go in the future. Unfortunately the implementation of the object storage in VOB is not optimal for any object storage.

Feature Request
- V1: Standard object size increase to minimum 4MB
- V2: I would prefer that in the UI is a setting to chose between, 1MB, 2MB, 4MB ..... 64MB, 128MB.

About the actual implementation
Good
- Items are downloaded to cache and then packed into an object that's about 1MB (x Mails in one object)

Bad
- 1MB objects are way to small to use object storage in an optimal way

The problem (background infos)
- Index Performance
- rescan or resync of buckets (example: when you lose your cache, mount the bucket to a new proxy/server etc.)
- potential problems with your storage / provider

Ceph, Cloudian and many other object storage solutions have some limitations about the maximum object in a bucket or in the whole system. The reason is the database behind this, that must track the index in which every single object is again tracked in the buckets. For "small" environments it's maybe not really a big deal. But what is "small"? In my lab I had about 130k Mails, total backup of 4.7GB and allready 5556 objects in this bucket.

---
William

Real live examples from customers of mine
- ceph: 100 mio+, 200 mio+, 300 mio+ objects in a bucket can lead to problems with the index.
- Cloudian system (software 7.1.x) with about 4PB usable space for the backup: with the actual setup the customer can have something between 1-1.5 bio objects. About 1 - 1.5 PB of 1MB objects will fill up the system and let 2-3 PB unused. Of course he can upgrade with more SSDs and more RAM and more nodes... but this is not really the solution if you can change it in the software at no big cost.

Code: Select all

130K mails = 5556 Object = 4.7GB

Let's scale this up to 100TB:

Code: Select all

5556(objects)/4.7(GB) * 1024 (to go to TB) * 100 (for 100TB) = 121 Mio objects)

Change the object size to 8MB
121 Mio objects (1MB) / 8 = 15 Mio

If you now say "130k mails is a lot", then you haven't seen enterprise installatons. In my lab the test was with about 20 users from a real customer. The pilot will be with 100 Users, the production with about 20k users. And this is not the biggest one. Of course you can now say that this is the problem of the hoster. But what, if you ARE the hoster? or have a local object storage system?

I know that best practices is x users in a job and you can have multiple repositories. So you can do it as a "workaround" to have multiple buckets. But even then the problem with to many object per storage is not solved.

Let's go back to the real problem: 1MB is just a to small object compared to the metadata on RAM/SSD, CPU etc. that the storage needs for that.

How to fix this: use bigger objects, that's all. Not really hard to implement

- V1: Standard object size increase
- V2: I would prefer that in the UI is a setting to chose between, 1MB, 2MB, 4MB ..... 64MB, 128MB.

I know that the main focus is always about amazon and the retrieval cost. But a lot of people do not use amazon. There are providers with a API and traffic flat rate and there are customers with onprem object storage.

Another point of view: Veeam VBR has small block size to (depending on the backup job). If a customer has now a system with a maximum amount of object on it and has Veeam VBR and VOB writting to this storage, this will just fill up the counter to max objects in a very short time.

Post by **nielsengelen** » Apr 18, 2020 4:13 pm this post

Bigger block size will mostly just lead to more storage usage and not perse improvement in performance. That is the trade off. We’ve done testing with this and concluded that 1MB actually is a good block size for backing up SaaS platforms. We have plenty of large customers using Object Storage. For now, there are no plans to add this option but we’ll discuss it internally.

Post by **DE&C** » Apr 19, 2020 10:26 am this post

Thank you for your fast reply.

You confirm that there is no difference in backup performance. That’s what I guess from your architecture with the cache tier. So this is perfect, the feature request was already tested, does not have any negative impact on the backup for SaaS items and can be implemented in no time (because the code is already there, you tested it).

Why not let the customer chose the block size then? The customer should be able to decide by himself what trade off it’s willing to take.

Trade off from the point of view from most of my customers: Unable to use a simple bucket architecture bacuse of possible storage problems, not able to use the full storage capacity because of some index (objects per buckets or/and objects per system limitation), future ops problems if the backed up items will increase (and lot more…) vs. The possibility of more storage usage. This decision is very simple. And no, Amazon is no option here.

Performance

It makes a big difference if you have only to rescan 1Mio or 500mio objects in the buckets. Therefore the amount of data is irrelevant, because it is only an index operation. The only thing that affects this operations is the object count, independent of the object size.

I’m talking here about "hands on” in real customer environments when I had to troubleshoot and help to re-add big buckets (300mio+ object in a bucket). And here is the pain, because in such scenarios time matters.

I assume that you are talking about “Veeam backup” performance, and only from the point of view of your application.

I also assume that you are testing it against amazon. Of course there you have no “behind the scene” problems like my customer have. As I mentioned there are a lot of good other providers and onprem systems and there it matters for them.

Please read again carefully my first post. Your answer unfortunately did not address the main issue here.

All the problem mentioned are behind the scene from the focus of a storage administrator (provider or a customer with onprem object storage). All the problems are caused because of the to small block size (explained in my first post).

I didn't write anything about Veeam backup performance. The performance problem you will see in production is the rescan one (described above). But this is just a “worst case scenario”.

The other problems (again written above, small object block size) are the one you will see in a daily scenario. I have seen it now multiple times.

With just a small change*, you can make a big difference for providers and customers. Let the customer chose the block size for the trade off. Your customers (and with the help of your partners) are able to make the right decision knowing their setup. I really hope that Veeam starts now listening to the providers, partners and customers about the object storage implementation.

*let the customer chose the block size. And give him the choice for for a object size that really matters (like 8MB, 16MB, 32MB, 64MB, 128MB)

Apr 19, 2020 3:38 pm

Hey William,

We have done indeed testing but not specifically through the code, so the code is not ready

. That said, it is an Object Storage Repository v1 and I am sure that improvements can be made.

I am reading through your thread and some of the items got my interest. @Polina Let's take this on the table in the next 2 weeks and discuss. There might be some good and interesting stuff in it.

Thanks a lot for writing up these large posts. We really appreciate this!

Cheers
Mike

Apr 22, 2020 6:12 pm

Hey @D@DE&C
We noticed that a slight mistake got into our posts. The 1 MB per object is actually something from a prerelease and should not be in the GA version. Actually, we write (Exchange) 5 MB and (SharePoint) 8 MB. After compression, it remains objects from about 50 or 60 percent of the 5/ 8 MB. So if you indeed seeing objects of 1 MB, then please contact support because something is probably wrong.

For making this customizable, it is already on the table, and we are doing some testing now to see what we can support. As promised

Post by **DE&C** » Apr 24, 2020 12:01 pm this post

Hi Mike

Thank you very much for your reply.
1) Happy to hear that it is on the table

Important for the customer is the final size of the object that is written to the object storage

If your compression is efficient that it can achieve 40-50%, it would be good to give a hint on the "settings" for the customer. If he configures 8MB and this is before the compression, then the object will still be somewhere around 4MB and the customer is wondering whats going on

2) Thank you for this information. I will raise a support request. First I need to check with my engineers that there is no mistake with the version or object size on our side.

If you need different object storage systems to test it, just contact me. We can provide you access to ceph and cloudian.
(The Swiss channel already has access to it for VBR V9.5 and V10

Have a great weekend ahead

Post by **DE&C** » Apr 29, 2020 2:35 pm this post

Hi Mike

The case is open since Monday
- 50% of the objects are about 100KB
- ca. 40% is less then 2 MB
- the rest (10%) is bigger

Today the supporter confirmed that he has the exact same situation in his lab. He will now check this with R&D. I sent him the link to this thread.

Case: 04143241

Apr 29, 2020 2:46 pm

Hi William,

We're following the investigation and I can confirm that the case was escalated to RnD. Please stay in touch with your support engineer for updates.

Thanks!

bethecloud · Jun 10, 2021 3:47 pm

Hi Polina,

I wanted to check in to see if there has been any update on this ticket request. We are looking to integrate Veeam with Storj DCS (Decentralized Cloud Storage), and this feature would enable our integration – as we have an optimal blocksize of 64mb

For background, we are a globally distributed, S3-compatible object storage layer led by the former CEO of Docker and Head of Marketing at Twilio.

Right now Veeam makes a block and then compress it (size varies) 40-60% or original – this makes it hard to hit 64mb on the nose but ideally Veeam could take 64mbs and compress it to (max) lets say 32mbs. Even 16mb is better than the 2-4mb we see today with Veeam's office 365 backup.

Check out our docs here: https://docs.storj.io/getting-started/gateway-mt.

Thanks – and looking forward to this feature,
Kevin

Jun 15, 2021 12:26 pm

Kevin,

Please contact our technical alliances group for this type of requests

Thanks
Mike

R&D Forums

[Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Re: [Feature Request] Adjustible objectsize for object storage - background why the actual is horror for a storage admin

Who is online