Replication jobs Suddenly came to a crawl

lobo519 · Post by **lobo519** » May 11, 2012 2:01 pm this post

This morning suddenly my replica jobs came to a crawl and are reporting the target as the bottleneck at 99% Network 0% (These jobs are over the WAN). There barley any utilization on the connection.

Nothing is jumping out at me - any suggestions to test what might be wrong on the target end? The datastore is dedicated to these replica jobs so is no other VM fighting for IOPS...

Veeam Server/Proxy > WAN (VPN) > Veeam Proxy

Busy: Source 1% > Proxy 14% > Network 0% > Target 99%

Post by **foggy** » May 11, 2012 2:31 pm this post

The first guess: could the target proxy suddenly fail over to network mode from using hotadd for some reason?

lobo519 · Post by **lobo519** » May 11, 2012 2:35 pm this post

It shows hotadd;nbd in the job log and there are no errors regarding failing over to network mode...

lobo519 · Post by **lobo519** » May 11, 2012 2:51 pm this post

The read rate is very low (500kb) and as I said there is not much utilization on our WAN connection, would this be normal behavior if the target was the bottleneck?

My target proxy also has not much utilization.

Could this be a connection issue?

Post by **Vitaliy S.** » May 11, 2012 8:22 pm this post

Could be, I would suggest to reach our support team and review job logs from both runs (current and previous one). If there are no errors and everything looks the same, then most likely the WAN connection is the issue .

lobo519 · Post by **lobo519** » May 11, 2012 8:27 pm this post

I just created a new replica job - different VM but same config/hardware/datastore.

Works like a champ.

I just got off the phone with my provider they said the circuit looks good but we left the ticket open..

I did open a ticket 5190717

lobo519 · Post by **lobo519** » May 12, 2012 10:55 pm this post

May have found the issue here - Will report on Monday if everything is still working..

lobo519 · May 14, 2012 1:40 pm

Everything is running fine again - It was our Target SAN (Veeam was correctly reporting the target as the bottleneck the whole time)

We are replicating to an MD3000i.

I eventually found in the log of the MD3000i (information, not error) that the array decided to disable its cache on the controllers while it did a battery learn cycle greatly reducing the performance of the array. It took almost a day for it to finish and when it did, all performance returned and the jobs are running normally.

Post by **Gostev** » May 14, 2012 9:04 pm this post

Wow. It is VERY useful for me to know that SAN storage may be doing something like this occasionally. Thank you very much for your post.

R&D Forums

Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Re: Replication jobs Suddenly came to a crawl

Who is online