while performing delete operations on some old Disk (Imported) backups, we regularly run into error (#04509715).
Code: Select all
[23.11.2020 12:07:24.566] < 15100> aws| WARN|HTTP request failed, retry in [1] seconds, attempt number [1], total retry timeout left: [1800] seconds
[23.11.2020 12:07:24.566] < 15100> aws| >> |Amazon REST error: 'S3 error: Please reduce your request rate.
[23.11.2020 12:07:24.566] < 15100> aws| >> |Code: SlowDown', error code: 503
[23.11.2020 12:07:24.566] < 15100> aws| >> |--tr:Request ID: 5A909BF002AA5C88
[23.11.2020 12:07:24.566] < 15100> aws| >> |Other: HostId: 'uYjWmLE8lZnZmOLtmxcItYP7bkSChsuEDAUrPWkEFZTFPj/A2zHPSYJSU3fqZjrUfP7RZBQ/Z60='
[23.11.2020 12:07:26.066] < 39288> aws| WARN|HTTP request failed, retry in [34] seconds, attempt number [6], total retry timeout left: [1750] seconds
[23.11.2020 12:07:26.066] < 39288> aws| >> |Amazon REST error: 'S3 error: Please reduce your request rate.
[23.11.2020 12:07:26.066] < 39288> aws| >> |Code: SlowDown', error code: 503
[23.11.2020 12:07:26.066] < 39288> aws| >> |--tr:Request ID: 7F98B851D7101500
[23.11.2020 12:07:26.066] < 39288> aws| >> |Other: HostId: 'QryEJu8LL2hUhC4ReZGdUyCqjDtJ/WCRtx8avkTW8lafD+BI1YnWJpqAyaJrHiKzQ9dGrakfc9A
I
I can see from 10:30am to 11:30am more than 3500 DELETE requests per second against the bucket.
S3 allow deleting multiple objects with a single HTTP request using the "DeleteObjects" API. The single HTTP request is tracked in CloudWatch, however internally, S3 processes a DELETE operation for each object in the original request. This explains why you can see about 350 in the metrics, while I see more than 3500 "internal" operations.
Internal S3 resources for this request rate aren't automatically assigned. Instead, as the request rate for a prefix increases gradually, Amazon S3 automatically scales to handle the increased request rate. This is why you are seeing these 503 errors.
I suggest you to contact Veeam support and if possible work with them to find a solution for the backup application to gradually increase the request rate, and retry failed requests using an exponential backoff algorithm, as explained in this documentation [1].
Additionally, you can ask them to distribute objects and requests across multiple prefixes, which is a best practice in rare cases where the supported request rates are exceeded.
[1] Error retries and exponential backoff in AWS - https://docs.aws.amazon.com/general/lat ... tries.html
I'm working on this with support and we already changed regkey S3MultiObjectDeleteLimit to a 500.
What I wanted to bring up is comment from AWS support: "gradually increase the request rate, and retry failed requests using an exponential backoff algorithm". Is this something Veeam is aware of and which will implemented at some point?