Discussions specific to the VMware vSphere hypervisor
Post Reply
Posts: 21
Liked: never
Joined: Feb 01, 2010 7:41 pm
Full Name: Shawn Barnhart

Replication job, some VMs fast, others super slow

Post by cyberswb »

[Veeam, ESXi 4.1, vCenter 4.1 Build 433742, Equallogic PS6000, PS4000]

I have been trying to get a replication job with six VMs in it to run and have run into some strange behavior. Production storage is on the PS6000, replication target is on the PS4000. No WANs or slow links. Veeam running in a 4 CPU, 8GB VM, direct SAN access to both SANs.

Four of the VMs replicate what I would call normally -- the first pass running at 40-100 MB/s, subsequent passes running at the speeds you might expect, 2-300 MB/sec. However, there are two VMs that won't replicate their first passes any faster than 2 MB/sec.

Overall SAN I/O is low, there are no backup jobs running concurrently and at least one if not both of these VMs aren't in regular production yet, so whatever impact a busy VM would have isn't a factor.

I finally decided to let the job just run (total runtime at point is 5 days) and one of the two slow VMs actually finished, but the second is still chugging, now at 917Kbytes/sec.

Oddly, backups take place on another box using Veeam (same version) and that system has no I/O issues associated with these VMs.

What should I be looking for to troubleshoot this? I've tried a number of things including deleting replicas and starting over with a new job, I've migrated the VMs to other hosts, but (there are no bottlenecks that we can ID at the host level).

SVP, Product Management
Posts: 28681
Liked: 5188 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Replication job, some VMs fast, others super slow

Post by Gostev »

High "processing rate" in v5 means there are very few changed blocks in VM, and vice versa, low means lots of changes to process. This is really bad counter to look at, completely and utterly useless to troubleshoot performance issues, and in fact it was completely dropped in v6 because of being so useless.

Of course, it was not always useless like that, as it was first designed in the days when VMware did not have CBT, and incremental runs were done with snap&scan method.

Post Reply

Who is online

Users browsing this forum: No registered users and 35 guests