Comprehensive data protection for all workloads
Post Reply
dbr
Expert
Posts: 118
Liked: 14 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Copy Jobs with StartAgent and TestCompatible errors

Post by dbr »

Dear all,

I have problems with three copy jobs and already opened a case (03440861). Once the runtime overlaps we receive following errors:

Code: Select all

Error: Failed to call RPC function 'StartAgent': Timed  out requesting agent port for client sessions.
Error: Failed to call RPC function 'TestCompatible': Error code: 0x80070583. Cannot initialize COM runtime.
In addition we have sometimes finalizing errors in primary backup jobs using the respository as target:

Code: Select all

Finalizing
Error: Error code: 0x80070008
I already increased the port range for Veeam server and repository servers as described in https://www.veeam.com/kb1922. Currently configured: 2500-20000. But that did not help. In the logs I can see, that the VeeamAgent process cannot be startet on source repository:

Code: Select all

[08.03.2019 05:02:59] <703> Info         [AgentMngr] Starting agent with normal priority, Host '<SourceRepositoryServer>'. <cut>
[08.03.2019 05:07:59] <703> Error    Failed to call RPC function 'StartAgent': Timed  out requesting agent port for client sessions. <cut>
I tested with a task limit on the repository (50 per extent, two extents per SOBR). It seems that is runs better but I want to run the copy job with the maximum available resources. I noticed when running copy job without repository limit on the source respository there are many VeeamAgent.exe processes started. The number increases until about 755 processes. When this number is reached the number stucks and the copy jobs throw the first error messages. Is there a limit how many processes can be started on a repository server? Any other ideas on this?

Setup:

Veeam B&R 9.5U4
Around 500vm to backup distributed across 7 backup jobs
Our ScaleOut repositories consists of 2 extents each, but the two extents are on the same server because the machines have two raid-sets and therefore two partitions.
ScaleOut with performance settings and per-vm files enabled.
BackupJob1 -> Target: ScaleOut1 (Server1: Repository1 and Repository2)
BackupJobn -> Target: ScaleOut1 (Server1: Repository1 and Repository2)
CopyJob1 (including all backup jobs) -> Source: ScaleOut1 (Server1: Repository1 and Repository2)-> Target: ScaleOut2 (Server2: Repository3 and Repository4)
CopyJob2 (including all backup jobs) -> Source: ScaleOut1 (Server1: Repository1 and Repository2)-> Target: ScaleOut3 (Server3: Repository5 and Repository6)
CopyJob3 (including all backup jobs) -> Source: ScaleOut1 (Server1: Repository1 and Repository2)-> Target: ScaleOut3 (Server3: Repository6 and Repository6)

Thanks,
Daniel.
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by ejenner »

Are you talking about limiting concurrent tasks for the repository?

I don't have SOBR as I had trouble with it. But I know by default after configuring a repository the default concurrent task number is 4. I bump mine up to 6 and never seen any issue with that. But setting it at 50... if we're talking about the same setting would be like me setting my concurrent task limit to 25... way above default.

Maybe we're thinking of different settings?
dbr
Expert
Posts: 118
Liked: 14 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by dbr »

Yes, I mean limiting concurrent tasks in repo. I disabled the limitation after creation. It seems even with 50 tasks a have no issues but I want to process my backups without limitation. In my opinion I had no issues without limits until one of the last update, but I'm not sure. We've used SOBR many months already.
csydas
Expert
Posts: 193
Liked: 47 times
Joined: Jan 16, 2018 5:14 pm
Full Name: Harvey Carel
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by csydas »

Hi Daniel,

Tasks are tied to CPU (cores), so it's possible you're just swamping the poor thing. 775 concurrent tasks on a repo is pretty high unless you have an extremely beefy CPU set to handle it.

What are the specs on the server? I'm willing to bet you just are killing the resource manager's ability to assign a resource.
dbr
Expert
Posts: 118
Liked: 14 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by dbr »

Hi Harvey,

The server has 2 sockets with 12 cores each and 128GB of memory. Meanwhile I received an additional reply from support: "Well, we do not have actual limitation for Windows repository (as 60 for DataDomain for example), but we do have a recommendation to have 1 core and 256 MB RAM for one task (well, it is not as strongly recommended for repositories as for backup proxies but still)." They told me to limit the concurrent tasks on the repository but the maximum value is 99. That's not enough for me cause I want to process all vms at a speed that is achievable without errors. However, I will set the maximum setting on the affected repos and hopefully Veeam will increase the maximum number of concurrent tasks that can be set or will implement a smarter way to handle large copy jobs.
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by ejenner »

I don't think Veeam is causing the problem. They offer the opposite of a limitation in the sense that you can put in whatever limit you want. The limitation seems to be your hardware. If you want to backup faster you will have to upgrade your kit.

If it is a complaint that before Update 4 you could and after Update 4 it has errors it may well be the case that you were on the very limit of what your system was capable of. Since Update 4 some new functionality has puts a tiny bit more load on the jobs and this is enough to cause errors for you.

I would say you were just lucky it was working before. People don't usually try to run 750 concurrent tasks.
dbr
Expert
Posts: 118
Liked: 14 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Copy Jobs with StartAgent and TestCompatible errors

Post by dbr »

I agree, maybe we actually ran close to capacity with earlier versions. Thanks for your statement.
Post Reply

Who is online

Users browsing this forum: m.novelli and 105 guests