V6 - ReplicaJob hung, had to reboot. Integrity of Replica ?

stevenrodenburg1 · Feb 01, 2012 9:27 pm

Hi,

I have a question about backup-file / replica integrity after a sudden abort of a job due to a Veeam-host reboot.

Environment:
1x Veeam Master. Is not a proxy. Just controlling.
2x Veeam Proxy with HotAdd as Transport Mode

15 Backup Jobs and 7 Replica Jobs (running once an hour).

Last night, something happened to cause a replica-job slowing down to a crawl. Read-speed dropped to 1MB/s. I never found the cause.
As this job uses both proxies, one as source-proxy, the second one as target-proxy, there were no proxies available for other replica- or backup-jobs. So all other jobs got queued.

Example:
Using source proxy veeam-proxy02.domain.local, hotadd
Using target proxy veeam-proxy01.domain.local, nbd
Hard Disk 1 (40.0 GB) 241.0 MB Read at 1MB/s, CBT

So, this job froze (or was so slow, equivalent to being frozen) and locked all proxies.
After many many hours, mr. Turtle over here, was still playing dead so in the meantime, all other jobs waiting for a proxy to become available, started failing with the famous message:

"Could not allocate resources within allowed timeframe (43200 sec)
Failed to start VM backup in the allowed time due to insufficient avaliable resources. Timeout: [43200 sec]"

That is the default 12 hours "wait time" that passed. I have not modified the registry to increase it because it won't solve the root-problem of jobs occasionally crawling to an almost halt.

What i noticed during those "chilly frozen replica-job moments" is that the process, belonging to SLQ2005 Express running on the Veeam Master, is causing a very high load on of the two vCPU's the Master has.
Memory, 2GB, had depleted (92% physical memory usage) so Windows was swapping like hell.

I believe that the 2GB of the Master might be too low for running so many simultaneous jobs so i increased it. When windows starts swapping like crazy, disk-performance goes down dramatically causing database operations to slow down, dragging Veeam down into a vicious circle.

So let's see what the future brings with the added memory on the master. It's not what i wanted to ask. My question is of a different nature.

An hour ago, i had to reboot the master because the Veeam console would not react to stopping jobs, i waited over 30 minutes for it to start showing a response (it was still working but the jobs i stopped just kept on running, not even showing "stopping" as what is shown normally.

Rebooting the master of course broke-off the two running replication-jobs (both crawling to 1MB/s, not wanting to stop).
The two proxy-VM's were not rebooted but as they lost their master, i assume they aborted what they where doing as well.

My question is this:
After such a brutal abort of a backup- or replication-job, when transfers are broken off so suddenly like that, what is the integrity-status of the backups and replicas ?

I can image that the files they were writing are now corrupt. Does Veeam detect such "unfinished-ness" and resulting corruption and how does it handle/recover from it?

Post by **Vitaliy S.** » Feb 01, 2012 10:22 pm this post

Hello Steven, all existing restore points will remain intact, and new (unfinished) restore point will be automatically removed by the next job run. Thank you!

R&D Forums

V6 - ReplicaJob hung, had to reboot. Integrity of Replica ?

Re: V6 - ReplicaJob hung, had to reboot. Integrity of Replic

Who is online