Large restore job failing after running for 2 days

withanh · Post by **withanh** » Mar 23, 2012 8:09 pm this post

I'm wondering if anyone else has problems with large/extended run restore jobs failing. I have a 1.3TiB restore that has failed twice at about 65%. I say about because I don't know exactly where it was but I saw it at 63% early this morning and just a few hours later it had failed.

It's interesting because it is the second attempt at restoring this (I'm doing a VM files restore so I can mount the .vmdk) and the first attempt it failed at 46:53:25 and the second attempt it failed at 46:53:38. I find it pretty interesting that it failed at almost the exact same moment both times.

Attempt 1:

Attempt 2:

Support (case # 5180882) is telling me I need to unblock port 443 on my firewall and try again. I don't use the Windows firewall inside my domain, plus it runs for almost 47 hours before it fails. If the port was being blocked by the firewall then it wouldn't start restoring and run for 2 days before failing, it would fail right away. At least in theory, right?

The restore speeds are pretty slow as well, but I think that's because it's going to relatively slow/archive storage. It's a 2 disk software raid 0 and I'm not too worried about the speed, other than I don't really want to restart the job knowing it will take 3 days to extract the VM, plus I'm expecting it to fail at the same point anyway.

h

Post by **tsightler** » Mar 23, 2012 8:50 pm this post

I would agree that this is a pretty incredibly slow restore. I'd think that with even moderate hardware 1.3TB wouldn't take more than 10-12 hours.

That being said, I'm wondering if you're being bitten by the same issue that caused backup jobs that were running for 48 hours to be forcibly terminated. This was fixed for backups/replications in patch 3. Have you installed patch 3? If so, perhaps this wasn't fixed for restores as I agree with you it seems quite coincidental that it failed at such a similar time.

withanh · Post by **withanh** » Mar 23, 2012 8:58 pm this post

I'll bet you're right about the 48hr bug. I didn't know about that one, I unfortunately don't get to follow the forums as much as I'd like.

But if you look at the total job times, attempt 1 from last week ran for 48:01:37 and attempt 2 from this week ran for 48:01:40.

Not sure if I have patch 3 installed or not. My build says 6.0.0.164.

h

withanh · Post by **withanh** » Mar 23, 2012 9:06 pm this post

Looks like .164 is patch 2. I'll download and install patch 3 and hopefully that fixes it. I'm not sure the dept that requested the restore will be very friendly to 2 more days without a result!

Post by **dellock6** » Mar 23, 2012 11:06 pm this post

I think they will be happier anyway than a non-restore

withanh · Post by **withanh** » Mar 23, 2012 11:09 pm this post

dellock6 wrote:I think they will be happier anyway than a non-restore

That's a great point Luca!

R&D Forums

Large restore job failing after running for 2 days

Re: Large restore job failing after running for 2 days

Re: Large restore job failing after running for 2 days

Re: Large restore job failing after running for 2 days

Re: Large restore job failing after running for 2 days

Re: Large restore job failing after running for 2 days

Who is online