sporadic backup failures(works after 1-2 retries, mostly)

FBraendle · Post by **FBraendle** » Jun 18, 2013 12:24 pm this post

Hi Guys

We implemented a new backup environment/solution and migrated all of our backups to the new system.
(Veeam & DataDomain)

everything runs fine but we get sporadic backup failures of random machines. after one or two retries it works most of the time...

Error Message:
Task failed Error: Failed to wait mutex famVimService: timeout 300sec exceeded

"Feels" like there are too many jobs running. Which shouldnt be the case. Since with veeam v5 we had 4-6 jobs running simultaniously(residing on the same storage cluster) and didnt experience errors like this...

the repository is set to accept 4 connections at max and the backupserver and proxy are configured to run 2 jobs at max each...
also "align data blocks" is disabled and "decompress before storing" is enabled
backup server:
2008 r2 6gb 4cpu
with cpu and memory reservation for the sql instance

proxy server:
2008 r2 4gb 4 cpus

Jobs are set up with:
compression: none
inline deduplication: activated
mode: network

Repository:
CIFS Share on a DataDomain DD890

PS: i should mention that the jobs all start at the same time with only the proxy/backup server and repository "max connection limits" limiting the concurrency... (creates quite som overhead...)

so my actual question: what does that error message mean? couldnt find anything related to this...

Thnx in advance
Felix

Post by **foggy** » Jun 18, 2013 1:16 pm this post

Felix, have you already contacted support as it is advised to do with all technical issues?

Post by **Vitaliy S.** » Jun 18, 2013 1:27 pm this post

FBraendle wrote:the repository is set to accept 4 connections at max and the backupserver and proxy are configured to run 2 jobs at max each...

PS: i should mention that the jobs all start at the same time with only the proxy/backup server and repository "max connection limits" limiting the concurrency... (creates quite som overhead...)

Task failed Error: Failed to wait mutex famVimService: timeout 300sec exceeded

Since your backup proxy can only accept two tasks at a time, other 2-4 jobs have to sit idle waiting for spare resources. I would suggest changing the start time of your backup jobs (10-15 minutes difference should be enough) to see if this fixes the issue or not.

FBraendle · Post by **FBraendle** » Jun 18, 2013 1:59 pm this post

first of all thanx for the fast reply.

@foggy:
Its kind of annoying to contact support just to find out what an error message means, wouldnt you say?
i'd say if you would publish a document with all error messages and what they actually mean/what subsystems might be affected/in what circumstances they appear etc... you would save quite some time/money/nerves in your support department. (symantec does a good job there

)

just saying...

@Vitaly:
Thanx for the suggestion. that was actually my thinking too...
Strange thing: your support department told me to just run everything at the same time and let veeam "do the magic"...

Well, ill stagger them again...

In the future it would be nice to be able to set priorities on backup jobs/VMs additionally to the "concurrency" options.
Also you should enhance efficiency when jobs are waiting.
i dont see why every workerprocess needs to allocate >100MB of memory "just for waiting"...
might make sense to put the "vm selection logic" into its dedicated internal service instead of letting every worker process check it... (just trying to look into "the black box")
anyway, just a suggestion

kind regards
Felix

Post by **foggy** » Jun 18, 2013 2:14 pm this post

FBraendle wrote:Its kind of annoying to contact support just to find out what an error message means, wouldnt you say?

Not just to find out what an error message means, but to reveal the reason of this behavior mostly.

FBraendle wrote:i'd say if you would publish a document with all error messages and what they actually mean/what subsystems might be affected/in what circumstances they appear etc... you would save quite some time/money/nerves in your support department. (symantec does a good job there )

just saying...

Probably because they appear too often in their products? just saying...

Anyway, thanks for the feedback.

FBraendle · Jun 18, 2013 2:25 pm

i think we have a clash of ideology here.
and smartass answers usually suggest that there is a valid point which cannot be swept away with reason...
so...

Anyway, at least Vitaly gave a useful answer.

Have a good day, thnx for the fast reaction anyway...

AndyGDIT · Post by **AndyGDIT** » Jun 18, 2013 2:46 pm this post

I had something like this come up as well, ended up going to tier 2 support. Seems that vCenter can only take so many connections at once and we found that there were still some connections to where our Veeam Database is, in the dbo.VmWareSnapshots table from failed backups.

The path to look into is Databases -> VeeamBackup -> Tables -> System Tables -> dbo.VMWareSnapshots

Select the first 1000 rows, if you have anything in this table with no backups running at the time, then these are hung connections. Veeam Support or your DBA can help with deleting these (Or you if you know SQL)

Do not delete anything from this table if you have backups running

Jun 18, 2013 2:48 pm

FBraendle wrote:i think we have a clash of ideology here.
and smartass answers usually suggest that there is a valid point which cannot be swept away with reason...

Well, it was nothing but a healthy irony from my side.

FBraendle wrote:Anyway, at least Vitaly gave a useful answer.

I still think that contacting support should be the first step in such cases as the team behind this forum is not the one responsible for logs investigation, etc., and cannot assist you effectively with technical issues (in fact, we even have this in the forum rules available when you click New Topic).

R&D Forums

sporadic backup failures(works after 1-2 retries, mostly)

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Re: sporadic backup failures(works after 1-2 retries, mostly

Who is online