-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
S3-integrated Storage - Block Size
With S3-integrated storage (Eg: minio) the storage tells Veeam what block size to use (in this case, Minio recommends a 4 MB block).
However, when I check the job settings on a new backup job, using the defaults, the block size is set to 1 MB.
Which block size will be used on the backup job? The size the Veeam job reflects, or the size the storage system recommends?
Assuming the block size used is the block size the storage system recommends and the settings in the veeam job are ignored: If I wanted to override the block size, how is that done?
However, when I check the job settings on a new backup job, using the defaults, the block size is set to 1 MB.
Which block size will be used on the backup job? The size the Veeam job reflects, or the size the storage system recommends?
Assuming the block size used is the block size the storage system recommends and the settings in the veeam job are ignored: If I wanted to override the block size, how is that done?
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Software
- Posts: 287
- Liked: 138 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: S3-integrated Storage - Block Size
@tjurgens-s2d The block size that you specify in the backup job is what is used when putting objects to object storage. The object storage vendor may have their best practice size recommendations and if you choose to use them that is where you would select the block size to be used.
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
On Luca's blog post (https://www.virtualtothecore.com/a-firs ... am-sosapi/) it looks like it might be planned capability to force a default block size. I'm guessing that's not in place yet?
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Chief Product Officer
- Posts: 31707
- Liked: 7212 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: S3-integrated Storage - Block Size
No, this has not been implemented yet. Also, some people at Veeam argue if it's even a good idea to let storage vendors control this
For example, changing the default 1MB block to 4MB will approximately double your disk space consumption in the long run. This is obviously great for a storage vendor as they get to sell 2x more nodes, but not so great for your wallet or data center footprint.
For example, changing the default 1MB block to 4MB will approximately double your disk space consumption in the long run. This is obviously great for a storage vendor as they get to sell 2x more nodes, but not so great for your wallet or data center footprint.
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
As with anything, let it be configurable. If the storage vendor's SOSAPI implementation forces 4 MB blocks, let the Veeam client override it. Eg: If I create a job against a S3-integrated bucket, the block size defaults to the SOSAPI block size. If I change it, it uses the block size I set.
For our Minio implementation, we'd take 4 MB blocks over hundreds of millions of tiny blocks negatively impacting our performance. We have one client backing up 45 TB using 223 million objects! It has made my leadership team want to avoid Veeam, while I've been trying my best
The storage size issue only really gets amplified if the clients retain long term incremental backups. Yes, it takes more space, but its a tradeoff we (at least in our specific situation) need to make. Given each GFS point already is effectively free, its not so bad.
For our Minio implementation, we'd take 4 MB blocks over hundreds of millions of tiny blocks negatively impacting our performance. We have one client backing up 45 TB using 223 million objects! It has made my leadership team want to avoid Veeam, while I've been trying my best
The storage size issue only really gets amplified if the clients retain long term incremental backups. Yes, it takes more space, but its a tradeoff we (at least in our specific situation) need to make. Given each GFS point already is effectively free, its not so bad.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Software
- Posts: 287
- Liked: 138 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: S3-integrated Storage - Block Size
Also when the SOSAPI starts to use the object size as noted in the blog, it will be at the bucket level and every job pointed to that bucket will use that object size. As @Gostev mentioned, you can see significant storage growth when increasing the object size from the 1MB default/best practice value. Due to that, you should try sticking to changing the object size on a workload/job level and monitor the effect of the change.
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
That's all well and good, and we've been changing block sizes on individual backup jobs. However, when you're talking many customers, each with many jobs, changing these all takes a significant amount of time.
I agree that we need to change and monitor, which we have. It just doesn't scale well and is prone to human error.
Also concerned with how Capacity/Archive tier block sizes get set. For a direct to S3 backup job, its straight forward - change the block size on the job, run an active full and call it a day. Backup Copy jobs or Capacity/Archive tier jobs are more difficult - one has to change the primary backup job block size setting.
Unless the SOSAPI knows enough to change the block size on any jobs that either have a backup copy or are part of a SOBR with offloading to S3, forcing the block size on backup jobs alone isn't going to have a major impact, as I suspect most customers will use S3 as a backup copy target, or SOBR offload target. I may be wrong here though, Veeam may have more clear metrics on how S3 is being used.
I agree that we need to change and monitor, which we have. It just doesn't scale well and is prone to human error.
Also concerned with how Capacity/Archive tier block sizes get set. For a direct to S3 backup job, its straight forward - change the block size on the job, run an active full and call it a day. Backup Copy jobs or Capacity/Archive tier jobs are more difficult - one has to change the primary backup job block size setting.
Unless the SOSAPI knows enough to change the block size on any jobs that either have a backup copy or are part of a SOBR with offloading to S3, forcing the block size on backup jobs alone isn't going to have a major impact, as I suspect most customers will use S3 as a backup copy target, or SOBR offload target. I may be wrong here though, Veeam may have more clear metrics on how S3 is being used.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Vanguard
- Posts: 227
- Liked: 55 times
- Joined: Jan 13, 2011 5:42 pm
- Full Name: Jim Jones
- Location: Hurricane, WV
- Contact:
Re: S3-integrated Storage - Block Size
@gostev @sfirmes, I hate to resurrect this topic but I'm trying to understand the "why" of the storage increase when changing from a 1 MB to 4 MB block size. Is it just a matter of creating a higher liklihood of "misses" as deduplication is applied through block generation because 4x as much data has to match?
Jim Jones, Sr. Product Infrastructure Architect @iland / @1111systems, Veeam Vanguard
-
- Chief Product Officer
- Posts: 31707
- Liked: 7212 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: S3-integrated Storage - Block Size
Because incremental backups are block level so when a given data block had say 200KB of data changed in it, in the first case incremental backup will need to store only 1MB of data (which will include unchanged data that surrounds the changed 200KB), but in the second case 4 times more data will need to be stored (4MB block).
-
- Veeam Vanguard
- Posts: 394
- Liked: 169 times
- Joined: Nov 17, 2010 11:42 am
- Full Name: Eric Machabert
- Location: France
- Contact:
Re: S3-integrated Storage - Block Size
This is in fact no so trivial when you have huge amount of data, metadata operations surrounding the data block (like retention information) are something to take into account.
We are dealing with perfomance (job duration) issues when you do immutable backup copy to an S3 object storage with mutli TB VMs (which are included in job with regular VM..). You can spend hours (like 5 to 10 hours) where veeam will not transfert any data but just storm the storage with millions of putObjectRetention api calls since there are so many objects to put the retention information on. We are currently in an extensive analysis of the pattern, fine tunning of Veeam internals and object storage internals with the help of both vendor support teams to mitigate this side effect.
At least this is interesting as deep diving the thing makes you understand how everything works under the hood.
We are dealing with perfomance (job duration) issues when you do immutable backup copy to an S3 object storage with mutli TB VMs (which are included in job with regular VM..). You can spend hours (like 5 to 10 hours) where veeam will not transfert any data but just storm the storage with millions of putObjectRetention api calls since there are so many objects to put the retention information on. We are currently in an extensive analysis of the pattern, fine tunning of Veeam internals and object storage internals with the help of both vendor support teams to mitigate this side effect.
At least this is interesting as deep diving the thing makes you understand how everything works under the hood.
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
We have had the same challenges @emachabert. The small objects Veeam creates are definitely a challenge for any HDD based Minio deployment. Sure, we could solve those by switching to NVMe drives, but the cost of that is prohibitive.
Our best solution has been to increase block size to 4 MB (or even 8 MB, which is unlocked with a regedit). I would prefer if Veeam could allow us to set this value on Backup Copy Jobs and SOBR Offload settings, but as of yet that feature request has not been implemented. I understand the desire by Veeam to keep object sizes small, but it makes things very challenging for us.
We've been very honest with customers when telling them to go to 4 MB blocks and what the consequences are (increased size, fewer api calls, etc). Haven't had push back on it yet.
Our best solution has been to increase block size to 4 MB (or even 8 MB, which is unlocked with a regedit). I would prefer if Veeam could allow us to set this value on Backup Copy Jobs and SOBR Offload settings, but as of yet that feature request has not been implemented. I understand the desire by Veeam to keep object sizes small, but it makes things very challenging for us.
We've been very honest with customers when telling them to go to 4 MB blocks and what the consequences are (increased size, fewer api calls, etc). Haven't had push back on it yet.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Software
- Posts: 287
- Liked: 138 times
- Joined: Jul 24, 2018 8:38 pm
- Full Name: Stephen Firmes
- Contact:
Re: S3-integrated Storage - Block Size
One thing to note is MinIO used NVMe drives in their Veeam Ready testing: https://www.veeam.com/sys1047
Have you contacted MinIO to see if they have any recommendations on what performance metrics you can expect with an hdd setup?
Thanks
Steve
Have you contacted MinIO to see if they have any recommendations on what performance metrics you can expect with an hdd setup?
Thanks
Steve
Steve Firmes | Senior Solutions Architect, Product Management - Alliances @ Veeam Software
-
- Service Provider
- Posts: 271
- Liked: 77 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
Re: S3-integrated Storage - Block Size
@emachabert How many disk did you deployed in your minio? Is it distributed deployment? We deployed 1PB Minio last week and we're getting 8Gbps in writing performance
The most problem we encounter that Minio lacks of support even with their slack channel and the documentation is incomplete when you want to enable Nginx cache features.
The most problem we encounter that Minio lacks of support even with their slack channel and the documentation is incomplete when you want to enable Nginx cache features.
-
- Veeam Vanguard
- Posts: 394
- Liked: 169 times
- Joined: Nov 17, 2010 11:42 am
- Full Name: Eric Machabert
- Location: France
- Contact:
Re: S3-integrated Storage - Block Size
We are not using MinIO.
This is a multi PB, geo distributed object storage.
We have no issues on throughput either, we reach insane numbers as each node has 50gb/s front and 50gb/s back.
The issue is about the number of objects within one single bucket. When you reach a billion of objects on which you have to put retention metadata you need to be careful. Issues arise when using object lock.
Using SOBR with multiple buckets to spread the load is one trick that makes things smoother.
This is a multi PB, geo distributed object storage.
We have no issues on throughput either, we reach insane numbers as each node has 50gb/s front and 50gb/s back.
The issue is about the number of objects within one single bucket. When you reach a billion of objects on which you have to put retention metadata you need to be careful. Issues arise when using object lock.
Using SOBR with multiple buckets to spread the load is one trick that makes things smoother.
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
We understand the performance metrics we can expect with our Minio deployment. Hence the desire for larger blocks, specifically on backup copy jobs or SOBR offload jobs. We have over 4 PB of Minio deployed (in a single cluster across multiple pools) and can ingest at multiple Gbps.sfirmes wrote: ↑Sep 02, 2024 1:26 pm One thing to note is MinIO used NVMe drives in their Veeam Ready testing: https://www.veeam.com/sys1047
Have you contacted MinIO to see if they have any recommendations on what performance metrics you can expect with an hdd setup?
Thanks
Steve
I feel we are caught between a rock and a hard place. NVMe is cost prohibitive (although how amazing would that be!). Veeam uses small objects by default. A happy medium would be for Veeam to allow us to change the Block Size on a Backup Copy Job (or a SOBR Offload) and allow us to set an object size that our Object Storage (in this case Minio) is happier with, fully understanding that there is tradeoffs with backup sizes as a result. We don't always want to change block sizes on the initial backup job, because it may have been in place for years, or the customer's storage can't handle new full backups for the size change to take effect. Whatever the case is, its not always easy to put in place. Nor do we want to use a backup job for direct to S3 because it adds additional load to the production infrastructure. Also, despite my own personal desires, we don't only ingest backups from one backup product (it gets incredibly interesting when you dig into how different vendors deal with backups - from one vendor making one object per block, to another vendor making one object per drive that gets backed up).
It will make a huge difference for those of us stuck between a rock and a hard place.
Also, our issue is slightly different than @emachabert's - we don't have an issue with retention metadata, since that's not handled in any database for Minio.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
We didn't go the NGINX route, we use HAProxy instead and are very happy with it. That said, I don't believe Minio recommends caching in either NGINX or HAProxy.edh wrote: ↑Sep 02, 2024 1:40 pm @emachabert How many disk did you deployed in your minio? Is it distributed deployment? We deployed 1PB Minio last week and we're getting 8Gbps in writing performance
The most problem we encounter that Minio lacks of support even with their slack channel and the documentation is incomplete when you want to enable Nginx cache features.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Service Provider
- Posts: 271
- Liked: 77 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
-
- Veeam Vanguard
- Posts: 394
- Liked: 169 times
- Joined: Nov 17, 2010 11:42 am
- Full Name: Eric Machabert
- Location: France
- Contact:
Re: S3-integrated Storage - Block Size
We observe two things that are demanding for the object storage when you are in a clustered multi node and multi site setup that needs synchronization over large number of operations:
- delete operations wich are POST operation grouped by default in batch of 1k by Veeam. If you receive millions of delete you will produce millions of small operation that need to be synchronized accross nodes (no matter what the metadata storage is). This is demanding (and we have more than 25TB of nvme for clustered dedicated metadata space)
- putObjectRetenion calls that are also sent in batch after the data transfert thus producing the same effect of smal operation storm
By the way we are using both nginx and haproxy.
Nginx are front line and act as reverse proxies with different feature in filtering, measuring and spreading the TLS compute over multiple servers.
Ha proxy are second line loadbalancer handling fast loadbalancing with healthchecks and transparent retrying without notifying the clients, also handling end to end TLS using multiple haproxy servers.
IMO caching in front of the storage has no benefit in a backup use case... if you serve frequently accessed data with your object storage then why not configuring per bucket caching at nginx level but I wouldnt do this for a backup use case.
- delete operations wich are POST operation grouped by default in batch of 1k by Veeam. If you receive millions of delete you will produce millions of small operation that need to be synchronized accross nodes (no matter what the metadata storage is). This is demanding (and we have more than 25TB of nvme for clustered dedicated metadata space)
- putObjectRetenion calls that are also sent in batch after the data transfert thus producing the same effect of smal operation storm
By the way we are using both nginx and haproxy.
Nginx are front line and act as reverse proxies with different feature in filtering, measuring and spreading the TLS compute over multiple servers.
Ha proxy are second line loadbalancer handling fast loadbalancing with healthchecks and transparent retrying without notifying the clients, also handling end to end TLS using multiple haproxy servers.
IMO caching in front of the storage has no benefit in a backup use case... if you serve frequently accessed data with your object storage then why not configuring per bucket caching at nginx level but I wouldnt do this for a backup use case.
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
-
- Service Provider
- Posts: 467
- Liked: 119 times
- Joined: Apr 03, 2019 6:53 am
- Full Name: Karsten Meja
- Contact:
Re: S3-integrated Storage - Block Size
Eric, what is the product we talking about?
-
- Veeam Legend
- Posts: 392
- Liked: 222 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: S3-integrated Storage - Block Size
Thanks for the link Manuel, interesting read. That said, I'll echo Eric's statement, its not going to help for use as a backup repository.
Now I'm super curious about Eric's Nginx & HAProxy deployment. My guess is he's using a Ceph cluster for S3.
Now I'm super curious about Eric's Nginx & HAProxy deployment. My guess is he's using a Ceph cluster for S3.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @tylerjurgens.bsky.social
-
- Veeam Vanguard
- Posts: 394
- Liked: 169 times
- Joined: Nov 17, 2010 11:42 am
- Full Name: Eric Machabert
- Location: France
- Contact:
Re: S3-integrated Storage - Block Size
No, it is not Ceph, it is a stretched RING over multiple datacenter, used with multiple protocols (file and object).
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
Who is online
Users browsing this forum: No registered users and 1 guest