-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Request failed with status code TooManyRequests
Is anyone else having this issue with their backup jobs from Nutanix.
Our platform
Nutanix AOS 6.8.0.5
Veeam 12.2
Veeam AHV app 6.1
Nutanix added via Prism to Veeam infrastructure
11 Workers on 22 Nutanix nodes (4 streams max on each worker)
2 repositories (Linux non-hardened) with both set to 8 concurrent tasks each. Each based on RAID 6 disks, with 16 and 20 drives in each RAID respectively. I realise this can be 44 tasks into 16 but Veeam schedules jobs to wait for an available stream right?
All of our backup jobs put the source as 99%, proxy around 60%, network 0% and target about 4% load, so potentially I could turn up the concurrent tasks on each repo.
We are currently backing up about 300 VMs a day and all of the jobs kick off roughly the same time and we leave Veeam scheduler to churn through them. At some point after the backups have started (maybe about an hour in I'm not 100% sure yet) we see some VMs fail in the backup job where others have completed fine. The VM usually starts to backup, we see GB of transferred data but then it errors out with:
"Failed to perform backup: Request failed with status code TooManyRequests "
Other times it fails during the step:
"Building a list of objects to process "
Request failed with status code TooManyRequests
In all instances the job goes on to retry the failed VMs and eventually gets a good backup of them, but these failure are slowing down the backup process and possibly causing Veeam more work. We hope to scale this platform up to over 1400 VMs in the coming months so we're trying to fix these early errors we're getting.
Veeam support have just closed a previous ticket of mine saying that AOS 6.8 isn't supported, so I'm turning to this forum for help. I want to track down what is actually saying "TooManyRequests", is it the Nutanix API or something in Veeam?
Regards
Rob
Our platform
Nutanix AOS 6.8.0.5
Veeam 12.2
Veeam AHV app 6.1
Nutanix added via Prism to Veeam infrastructure
11 Workers on 22 Nutanix nodes (4 streams max on each worker)
2 repositories (Linux non-hardened) with both set to 8 concurrent tasks each. Each based on RAID 6 disks, with 16 and 20 drives in each RAID respectively. I realise this can be 44 tasks into 16 but Veeam schedules jobs to wait for an available stream right?
All of our backup jobs put the source as 99%, proxy around 60%, network 0% and target about 4% load, so potentially I could turn up the concurrent tasks on each repo.
We are currently backing up about 300 VMs a day and all of the jobs kick off roughly the same time and we leave Veeam scheduler to churn through them. At some point after the backups have started (maybe about an hour in I'm not 100% sure yet) we see some VMs fail in the backup job where others have completed fine. The VM usually starts to backup, we see GB of transferred data but then it errors out with:
"Failed to perform backup: Request failed with status code TooManyRequests "
Other times it fails during the step:
"Building a list of objects to process "
Request failed with status code TooManyRequests
In all instances the job goes on to retry the failed VMs and eventually gets a good backup of them, but these failure are slowing down the backup process and possibly causing Veeam more work. We hope to scale this platform up to over 1400 VMs in the coming months so we're trying to fix these early errors we're getting.
Veeam support have just closed a previous ticket of mine saying that AOS 6.8 isn't supported, so I'm turning to this forum for help. I want to track down what is actually saying "TooManyRequests", is it the Nutanix API or something in Veeam?
Regards
Rob
-
- Veeam Software
- Posts: 583
- Liked: 216 times
- Joined: Mar 07, 2016 3:55 pm
- Full Name: Ronn Martin
- Contact:
Re: Request failed with status code TooManyRequests
@Amarokada yes this is an error we're receiving from the Nutanix API(s) we call. If possible could you open a case with Nutanix to try and determine a root cause? It may be something that we need to tune but since it's a pretty generic error some specificity from the Nutanix side would be very helpful.
-
- Service Provider
- Posts: 466
- Liked: 89 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Request failed with status code TooManyRequests
Can i suggest Veeam to provide support for this customer during troubleshooting with Nutanix !? If needed by exception, because of veeam not supporting 6.8 right now?(with some good reasons perhaps..).
Nutanix customers cant be on anything else right now and we need both parties to be involved to get these types of environments behave properly.
Refusing to support by closing cases because of 6.8 is not helping here. There must be a better way that’s more efficient than asking the customer to ask Nutanix what needs to be tuned by Veeam. This proces at least should be allowed to be tracked in a veeam support case.
Im about to scale to the same numbers on the same versions so im worried.
Unless 6.10 makes everything smooth again…
And again, i understand why 6.8 is difficult to formally get supported currently, but looking at the bigger picture, i expect a bit more from Veeam (and Nutanix).
Nutanix customers cant be on anything else right now and we need both parties to be involved to get these types of environments behave properly.
Refusing to support by closing cases because of 6.8 is not helping here. There must be a better way that’s more efficient than asking the customer to ask Nutanix what needs to be tuned by Veeam. This proces at least should be allowed to be tracked in a veeam support case.
Im about to scale to the same numbers on the same versions so im worried.
Unless 6.10 makes everything smooth again…
And again, i understand why 6.8 is difficult to formally get supported currently, but looking at the bigger picture, i expect a bit more from Veeam (and Nutanix).
Veeam Certified Engineer
-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Re: Request failed with status code TooManyRequests
Our connection to Nutanix is via a 3rd party but we can try to get them involved. Apart from this "TooManyRequests" issue we are seeing another problem also.
All of our backup jobs from the Nutanix cluster are all created from a template so we know they have the same settings (with VM selection and a few schedules different). In total we have 18 backup jobs, but we're noticing something strange with 3 of them.
Normally (ie, the other 15 jobs) when the VMs get backed up we see the backup point shown in the "Disk" section of Veeam, and a Snapshot shown in the "Snapshot" section. But on these 3 jobs we never see anything in the snapshot section. On Nutanix we see the snapshot being created but it gets deleted after the backup completes. This also seems to tie into an issue we see with the repository scan that the AHV appliance does each day, it moans with entries like this:
10/1/2024 4:38:43 PM Error Failed to import backup 80aa5a68-b030-41b2-99f9-c7e9b2f0395c. Error: Object reference not set to an instance of an object. —
We tracked down the VMs these few UUIDs relate to and low and behold they are in the 3 jobs we never see snapshots for in the Veeam interface. It's not just the VMs on the repo scan that don't have snapshots it's all the VMs in the same job. I doubt this is just coincidence, but of course it could be. As a test I deleted one of the jobs entirely and all references to the backups it had, re-created it from scratch and it created new UUIDs for the same VMs, the repo scan showed the above error for them, and still no snapshot in the Veeam interface (or Nutanix once the backup completes). This leads me to think something about the VMs themselves is causing this bug to happen but I wouldn't have a clue where to track this down.
I would be happy to run through our situation with someone from Veeam on a call if that helps.
Rob
All of our backup jobs from the Nutanix cluster are all created from a template so we know they have the same settings (with VM selection and a few schedules different). In total we have 18 backup jobs, but we're noticing something strange with 3 of them.
Normally (ie, the other 15 jobs) when the VMs get backed up we see the backup point shown in the "Disk" section of Veeam, and a Snapshot shown in the "Snapshot" section. But on these 3 jobs we never see anything in the snapshot section. On Nutanix we see the snapshot being created but it gets deleted after the backup completes. This also seems to tie into an issue we see with the repository scan that the AHV appliance does each day, it moans with entries like this:
10/1/2024 4:38:43 PM Error Failed to import backup 80aa5a68-b030-41b2-99f9-c7e9b2f0395c. Error: Object reference not set to an instance of an object. —
We tracked down the VMs these few UUIDs relate to and low and behold they are in the 3 jobs we never see snapshots for in the Veeam interface. It's not just the VMs on the repo scan that don't have snapshots it's all the VMs in the same job. I doubt this is just coincidence, but of course it could be. As a test I deleted one of the jobs entirely and all references to the backups it had, re-created it from scratch and it created new UUIDs for the same VMs, the repo scan showed the above error for them, and still no snapshot in the Veeam interface (or Nutanix once the backup completes). This leads me to think something about the VMs themselves is causing this bug to happen but I wouldn't have a clue where to track this down.
I would be happy to run through our situation with someone from Veeam on a call if that helps.
Rob
-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Re: Request failed with status code TooManyRequests
Just an update on this one. I still haven't heard back from Veeam on my ticket yet, but on a hunch I disabled one of my workers, so going from 11 to 10. This brought the total tasks down from 44 to 40 and last night it seemed I didn't get a single toomanyrequests error. Which seems to suggest Prism can only handle 40. I'm not 100% sure of this yet because we had quite the blockage of backup copy jobs at the same time which might have slowed down the backups (as they waited for slots to the repo), so I'll know more later this week.
-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Re: Request failed with status code TooManyRequests
Ok, was false hope, still getting the toomanyrequests issue with just the 10 workers.
-
- Service Provider
- Posts: 466
- Liked: 89 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: Request failed with status code TooManyRequests
Are you still on 6.8.0.5 currently with these issues?
Any plans for 6.10 in near future?
Just curious how you are considering these releases.
Any plans for 6.10 in near future?
Just curious how you are considering these releases.
Veeam Certified Engineer
-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Re: Request failed with status code TooManyRequests
Good morning.
So yesterday I reduced our worker stream count drastically. We went from 10 workers with 4 streams each to 10 workers with 2 streams each, so a 40 to 20 total drop. This is across 22 AHV hosts in the cluster.
Last night all of the backups finished without a single error for the first time since building this platform.
We have a graph of the data transfer to the Rep during the backup window and over those 8 hours it peaks to about 512MB/sec (4gbps) but we do see a drop off for 3 whole hours where the speed has dropped to around 200MB/sec.
Obviously we're happy the errors we've been having have stopped, but I'm a little surprised nobody suggested that we're running too many worker threads to a single cluster. It would see that somewhere between 20-40 it's possible to hit a threshold whereby the cluster API will stop responding to new requests and even drop older ones. Has Veeam with their testing of AHV ever come up with a suggested maximum threads?
So yesterday I reduced our worker stream count drastically. We went from 10 workers with 4 streams each to 10 workers with 2 streams each, so a 40 to 20 total drop. This is across 22 AHV hosts in the cluster.
Last night all of the backups finished without a single error for the first time since building this platform.
We have a graph of the data transfer to the Rep during the backup window and over those 8 hours it peaks to about 512MB/sec (4gbps) but we do see a drop off for 3 whole hours where the speed has dropped to around 200MB/sec.
Obviously we're happy the errors we've been having have stopped, but I'm a little surprised nobody suggested that we're running too many worker threads to a single cluster. It would see that somewhere between 20-40 it's possible to hit a threshold whereby the cluster API will stop responding to new requests and even drop older ones. Has Veeam with their testing of AHV ever come up with a suggested maximum threads?
-
- Service Provider
- Posts: 135
- Liked: 12 times
- Joined: Jan 30, 2015 4:24 pm
- Full Name: Rob Perry
- Contact:
Re: Request failed with status code TooManyRequests
Nevermind, I had a bunch of failures again with the "TooManyRequests" still, with only 20 streams ; ;
Who is online
Users browsing this forum: No registered users and 20 guests