- 
				edh
- Veeam Legend
- Posts: 420
- Liked: 131 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
s3.PutObjectRetention millions request per hour.
Hi,
As we moved to S3 Storage, we encountered another issue "s3.PutObjectRetention" in millions per hours.
Looks like is a MetaData Lock-In Refresh for inmutability but this small request near 254B saturare the full configuration.
Is there any way to delay or throtthle? It can cause that backups cant be uploaded during this "nightmare hours"...
Our S3 is a Minio Distributed that we tested the PUT with normal objects up 10-13 Gbps...
Regards
Manuel
			
			
									
						
							As we moved to S3 Storage, we encountered another issue "s3.PutObjectRetention" in millions per hours.
Looks like is a MetaData Lock-In Refresh for inmutability but this small request near 254B saturare the full configuration.
Is there any way to delay or throtthle? It can cause that backups cant be uploaded during this "nightmare hours"...
Our S3 is a Minio Distributed that we tested the PUT with normal objects up 10-13 Gbps...
Regards
Manuel
Service Provider | VMCE
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Hi, I recommend you check with MinIO on this.
Only actual storage can know if it's overloaded, and it's trivial for storage to delay/throttle processing of "offending" requests to address this temporary overload. Alternatively, object storage should respond with 503 Slow Down HTTP error code, which will make Veeam engage exponential backoff algorithm for retries. All of this is industry-standard S3 stuff and most object storage use one of these approaches or a combination of both.
Just to be clear, what do you mean by "normal objects"?
			
			
									
						
										
						Only actual storage can know if it's overloaded, and it's trivial for storage to delay/throttle processing of "offending" requests to address this temporary overload. Alternatively, object storage should respond with 503 Slow Down HTTP error code, which will make Veeam engage exponential backoff algorithm for retries. All of this is industry-standard S3 stuff and most object storage use one of these approaches or a combination of both.
Just to be clear, what do you mean by "normal objects"?
- 
				edh
- Veeam Legend
- Posts: 420
- Liked: 131 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Thanks Gostev..
For us "normal size" 1MB to 4MB objects , but this Put request are for 254bytes. Mmm if Veeam handle the retrys... maybe i can implement something arround Nginx for force a throthle...
, but this Put request are for 254bytes. Mmm if Veeam handle the retrys... maybe i can implement something arround Nginx for force a throthle...
			
			
									
						
							For us "normal size" 1MB to 4MB objects
 , but this Put request are for 254bytes. Mmm if Veeam handle the retrys... maybe i can implement something arround Nginx for force a throthle...
, but this Put request are for 254bytes. Mmm if Veeam handle the retrys... maybe i can implement something arround Nginx for force a throthle...Service Provider | VMCE
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Got it. This PUT request does not actually put any data, it instruct object storage to extend immutability on the existing storage, thus it only needs to pass metadata.
I've just asked devs for the full list of HTTP error codes that trigger exponential backoff algorithm, may be it will help with your workaround idea.
			
			
									
						
										
						I've just asked devs for the full list of HTTP error codes that trigger exponential backoff algorithm, may be it will help with your workaround idea.
- 
				Mildur
- Product Manager
- Posts: 11023
- Liked: 3026 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Hello
I moved this topic to our object storage subforum.
Best,
Fabian
			
			
									
						
							I moved this topic to our object storage subforum.
Best,
Fabian
Product Management Analyst @ Veeam Software
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
The following HTTP error codes trigger exponential backoff algorithm in Veeam:
			
			
									
						
										
						Code: Select all
Error 400 with one of the following statuses:
Request Timeout, ExpiredToken, TokenRefreshRequired
Error 402 /* Service Quotas Exceeded */
Error 403 with one of the following statuses:
ConnectionLimitExceeded, Throttle, RequestTimeTooSkewed
Error 408 /* Request Timeout */
Error 409 with one of the following statuses:
OperationAborted
Error 429 with one of the following statuses:
SlowDown, CloudKmsQuotaExceeded
Error 500 /* Internal Server Error */
Error 502 /* Bad Gateway */
Error 503 /* Service Unavailable, Slow Down */
Error 504 /* Gateway Time-out */
- 
				edh
- Veeam Legend
- Posts: 420
- Liked: 131 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Hi,
Just for update, after several hours, we're going to rolling back to Block Storage Inmutable. There's no elegant solution to accomplish this without move to QLC SSD drives and expend several 200K € in new storage.
Stats of Distributed Cluster +1PB full saturated by small request. This same cluster without buckets inmutable can handle 10Gbps , tested with Veeam VBO 365.
We tryed a workarround with Nginx map and limit_req zone configs returning a 429 error to force throthle at Veeam Agents / VBRs but it dont works.
From our perspective the main problem comes when a task finish and then start to apply the retencion object logic to update all metadata of all object of the backup.
This small request, sub 1KB saturate IO at physical layer and dont allow other customers to proces their s3.PutObject to the repository creating a global stock over the repository.
Just 1 TB at 4MB is about 250 000 request, but as we cant setup/force customer block size, normally is 1MB and about 1Million request per TB stored in S3 Inmutable and in proposed scale 1PB will be arround 1 000 000 000 requests overloading any IO available, just a simple math, 1PB will take about 55hours to be fully metadata updated.
Of course this not happens with NVME options, but NVME still is not €/GB afforable at least for our market.
I checked in VBO how it works and looks like its implemented at "Repo Level" to apply the retention at schedule time.
Why tryed this setup to allow our customer a better scalability solution and allow them "Instant Options" for restore but this setups dont works at least in our small scale.
How it can be improved? I dont know but as Veeam with SOSApi create a system and capacity xml files, maybe instead use S3.PutObjectRetention can create a hidden xml file that update in a single put the full object locking.
Or maybe allow to SORBs admins to setup when should be scheduled the PutObjectRetention, or as Veeam got a regkey to limit the number of object deleted per request, create a key to limit in the Proxy the number of S3ObjectRetention request per second.
Im pretty sure im not the only sysadmin that got this caveheads with S3 storages... now i know why Dell removed ECS....
Regards,
Manuel
			
			
									
						
							Just for update, after several hours, we're going to rolling back to Block Storage Inmutable. There's no elegant solution to accomplish this without move to QLC SSD drives and expend several 200K € in new storage.
Stats of Distributed Cluster +1PB full saturated by small request. This same cluster without buckets inmutable can handle 10Gbps , tested with Veeam VBO 365.
Code: Select all
Duration: 11m20s ▱▱▱
RX Rate:↑ 5.2 GiB/m
TX Rate:↓ 107 MiB/m
RPM    :  5026.2
-------------
Call                      Count          RPM     Avg Time  Min Time  Max Time  Avg TTFB  Max TTFB  Avg Size     Rate /min    Errors
s3.PutObject              25037 (44.0%)  2209.5  33.152s   2.866s    4m36s     33.152s   4m36s     ↑2.4M ↓1B    ↑5.2G ↓3.2K  0
s3.PutObjectRetention     23733 (41.7%)  2094.4  15.131s   3.026s    38.324s   15.131s   38.324s   ↑254B        ↑520K        0
s3.HeadObject             3012 (5.3%)    265.8   111.1ms   952µs     24.522s   111ms     24.522s   ↑121B        ↑31K         0
s3.GetObject              2011 (3.5%)    177.5   725.5ms   543µs     19.232s   691.2ms   19.232s   ↑121B ↓606K  ↑21K ↓105M   0
s3.DeleteObject           1397 (2.5%)    123.3   1.195s    962µs     30.508s   1.195s    30.508s   ↑121B        ↑15K         0
s3.ListObjectVersions     1022 (1.8%)    90.2    4.18s     1.9ms     35.586s   4.18s     35.586s   ↑121B ↓8.4K  ↑11K ↓753K   0
s3.ListObjectsV2          411 (0.7%)     36.3    9.76s     1.9ms     5m5s      9.76s     5m5s      ↑121B ↓38K   ↑4.3K ↓1.3M  0
s3.GetBucketLocation      197 (0.3%)     17.4    799µs     589µs     3.8ms     780µs     3.8ms     ↑121B ↓128B  ↑2.1K ↓2.2K  0
s3.DeleteMultipleObjects  87 (0.2%)      7.7     3.342s    4.1ms     30.318s   3.342s    30.318s   ↑1.6K ↓116B  ↑13K ↓890B   0
s3.HeadBucket             22 (0.0%)      1.9     696µs     607µs     1.2ms     656µs     1.1ms     ↑121B        ↑234B        0
s3.ListObjectsV1          18 (0.0%)      1.6     3.051s    1.202s    10.067s   3.051s    10.067s   ↑121B ↓315B  ↑192B ↓500B  0
s3.ListBuckets            8 (0.0%)       0.7     1.7ms     1.3ms     3ms       1.6ms     3ms       ↑159B ↓3.7K  ↑112B ↓2.6K  0From our perspective the main problem comes when a task finish and then start to apply the retencion object logic to update all metadata of all object of the backup.
This small request, sub 1KB saturate IO at physical layer and dont allow other customers to proces their s3.PutObject to the repository creating a global stock over the repository.
Just 1 TB at 4MB is about 250 000 request, but as we cant setup/force customer block size, normally is 1MB and about 1Million request per TB stored in S3 Inmutable and in proposed scale 1PB will be arround 1 000 000 000 requests overloading any IO available, just a simple math, 1PB will take about 55hours to be fully metadata updated.
Of course this not happens with NVME options, but NVME still is not €/GB afforable at least for our market.
I checked in VBO how it works and looks like its implemented at "Repo Level" to apply the retention at schedule time.
Why tryed this setup to allow our customer a better scalability solution and allow them "Instant Options" for restore but this setups dont works at least in our small scale.
How it can be improved? I dont know but as Veeam with SOSApi create a system and capacity xml files, maybe instead use S3.PutObjectRetention can create a hidden xml file that update in a single put the full object locking.
Or maybe allow to SORBs admins to setup when should be scheduled the PutObjectRetention, or as Veeam got a regkey to limit the number of object deleted per request, create a key to limit in the Proxy the number of S3ObjectRetention request per second.
Im pretty sure im not the only sysadmin that got this caveheads with S3 storages... now i know why Dell removed ECS....
Regards,
Manuel
Service Provider | VMCE
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
This is not about S3 storage in general, just the one you're testing (MinIO). Every object storage has different architecture as vendors prioritize different use cases. 
Some vendors put big focus on performance and scale with large number of objects and we even use a couple such vendors in our performance testing labs.
Ironically, Dell ECS was one of the best in the early days actually. I didn't know it was discontinued.
			
			
									
						
										
						Some vendors put big focus on performance and scale with large number of objects and we even use a couple such vendors in our performance testing labs.
Ironically, Dell ECS was one of the best in the early days actually. I didn't know it was discontinued.
- 
				edh
- Veeam Legend
- Posts: 420
- Liked: 131 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
Re: s3.PutObjectRetention millions request per hour.
Yeah , We tested Dell ECS but they use postgress SQL for store metada, not bad but scaleup billions entrys dont work  . The main problem with Minio is that they dont use OS buffers , just DirectIO and dont support LVM cache or other cache layers .... And their answer is always, fork the project and code anything you need
. The main problem with Minio is that they dont use OS buffers , just DirectIO and dont support LVM cache or other cache layers .... And their answer is always, fork the project and code anything you need 
			
			
									
						
							 . The main problem with Minio is that they dont use OS buffers , just DirectIO and dont support LVM cache or other cache layers .... And their answer is always, fork the project and code anything you need
. The main problem with Minio is that they dont use OS buffers , just DirectIO and dont support LVM cache or other cache layers .... And their answer is always, fork the project and code anything you need 
Service Provider | VMCE
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: s3.PutObjectRetention millions request per hour.
LOL!! This made my day.
			
			
									
						
										
						- 
				karsten123
- Service Provider
- Posts: 654
- Liked: 165 times
- Joined: Apr 03, 2019 6:53 am
- Full Name: Karsten Meja
- Contact:
Re: s3.PutObjectRetention millions request per hour.
what about using RAID 0 per disk to use controller cache?
			
			
									
						
										
						- 
				edh
- Veeam Legend
- Posts: 420
- Liked: 131 times
- Joined: Nov 02, 2020 2:48 pm
- Full Name: Manuel Rios
- Location: Madrid, Spain
- Contact:
Re: s3.PutObjectRetention millions request per hour.
They got a warning in their Documentation about not use intermediate cache o raid controllers, typical warnings like CEPH done... thats why we didnt try with RAID 0 but our servers gots 8GBs cache that maybe is not too much but they will absorb that 254Bytes request pretty easy... not today... but maybe other day will test with some dell demo servers.  Just now we finish our new deployment for solve this issue and we're getting 8GBps incoming traffic and 0.3 IO Wait... Veeam Transport + Linux + XFS + RAID 60 works like a charm.
			
			
									
						
							Service Provider | VMCE
			
						- 
				Gostev
- Chief Product Officer
- Posts: 32784
- Liked: 7990 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Who is online
Users browsing this forum: No registered users and 2 guests