Feature Request
- V1: Standard object size increase to minimum 4MB
- V2: I would prefer that in the UI is a setting to chose between, 1MB, 2MB, 4MB ..... 64MB, 128MB.
About the actual implementation
Good
- Items are downloaded to cache and then packed into an object that's about 1MB (x Mails in one object)
Bad
- 1MB objects are way to small to use object storage in an optimal way
The problem (background infos)
- Index Performance
- rescan or resync of buckets (example: when you lose your cache, mount the bucket to a new proxy/server etc.)
- potential problems with your storage / provider
Ceph, Cloudian and many other object storage solutions have some limitations about the maximum object in a bucket or in the whole system. The reason is the database behind this, that must track the index in which every single object is again tracked in the buckets. For "small" environments it's maybe not really a big deal. But what is "small"? In my lab I had about 130k Mails, total backup of 4.7GB and allready 5556 objects in this bucket.
---
William
Real live examples from customers of mine
- ceph: 100 mio+, 200 mio+, 300 mio+ objects in a bucket can lead to problems with the index.
- Cloudian system (software 7.1.x) with about 4PB usable space for the backup: with the actual setup the customer can have something between 1-1.5 bio objects. About 1 - 1.5 PB of 1MB objects will fill up the system and let 2-3 PB unused. Of course he can upgrade with more SSDs and more RAM and more nodes... but this is not really the solution if you can change it in the software at no big cost.
Code: Select all
130K mails = 5556 Object = 4.7GB
Code: Select all
5556(objects)/4.7(GB) * 1024 (to go to TB) * 100 (for 100TB) = 121 Mio objects)
121 Mio objects (1MB) / 8 = 15 Mio
If you now say "130k mails is a lot", then you haven't seen enterprise installatons. In my lab the test was with about 20 users from a real customer. The pilot will be with 100 Users, the production with about 20k users. And this is not the biggest one. Of course you can now say that this is the problem of the hoster. But what, if you ARE the hoster? or have a local object storage system?
I know that best practices is x users in a job and you can have multiple repositories. So you can do it as a "workaround" to have multiple buckets. But even then the problem with to many object per storage is not solved.
Let's go back to the real problem: 1MB is just a to small object compared to the metadata on RAM/SSD, CPU etc. that the storage needs for that.
How to fix this: use bigger objects, that's all. Not really hard to implement
- V1: Standard object size increase
- V2: I would prefer that in the UI is a setting to chose between, 1MB, 2MB, 4MB ..... 64MB, 128MB.
I know that the main focus is always about amazon and the retrieval cost. But a lot of people do not use amazon. There are providers with a API and traffic flat rate and there are customers with onprem object storage.
Another point of view: Veeam VBR has small block size to (depending on the backup job). If a customer has now a system with a maximum amount of object on it and has Veeam VBR and VOB writting to this storage, this will just fill up the counter to max objects in a very short time.