strange replication behavior

psointu · Post by **psointu** » Apr 06, 2011 11:14 pm this post

a vm being replicated to a remote dr site is exhibiting odd behavior ... usually takes around 5-6 hours to replicate changed blocks and based on the size of the .vrb file it seems to move around 400 MB ... last night, and on a few other occasions, it has moved more ... on the order of twice as much based on the final size of the .vrb file ... i noticed the last run which started at roughly 11 pm edt last night completed today around 5 pm ... it's .vrb file was large - around 750 MB ... likely owing to the job failing the last few evenings ... i'd expect it to take longer in that case ... what's odd is the sequence of events and the final result of success:

- job execution initiated at 11 pm edt last night
- esxi host shows the vm snapshot created four hours later at 3 am today
- job still running at 3:30 pm today when the windows 2k8 host that provides a datastore, via nfs, to the target datastore was bounced (planned but not well!)
- job completes at 5 pm today, generating an email and clearly showing success at that time looking at the "realtime statistics" of the job on the veeam host ... esxi source host also shows snapshot being deleted minutes before then

i'm wondering why it took the process four hours to get to the point of the snapshot getting created ... that seems excessive

i'm also wondering how the bounce of the windows host that provides the target datastore via nfs did not cause the job to fail ... i have to believe that at 3:30 pm and beyond the veeam host and job were no longer even trying to transmit data / communicate with the target esxi host / datastore ... that seems the only reasonable explanation ... it does beg the question, similarly to the above circumstance : what was going on for that 90 minute period and for how much time prior to 3:30 pm had the job been spinning but neither actively transmitting changed blocks nor asking the source esxi host to delete the snapshot?

any thoughts on this would be much appreciated

thanks,
-p

Post by **Vitaliy S.** » Apr 07, 2011 8:38 am this post

Hello Phil,

psointu wrote:i'm wondering why it took the process four hours to get to the point of the snapshot getting created ... that seems excessive
i'm also wondering how the bounce of the windows host that provides the target datastore via nfs did not cause the job to fail ...

In order to shed some light on these events, I suggest reviewing the corresponding job logs. You can locate those files by navigating to Help -> Support Information.

If you need any assistance with this, please contact our technical team directly. I would appreciate if you could update this topic with your findings from the log files.

Thanks.

psointu · Post by **psointu** » Apr 12, 2011 2:22 am this post

i finally got a chance to look at the job log ... it's quite large and i've done my level best to pore through it ... here's what, at least to me, is interesting:

the log shows the job kicking off @ 22:57:26 ... there's a slew of messages up until 23:03:34 but nothing that seems to indicate an issue, error or problem ... then there's nothing further in the log until 02:52:01 with the first occurence of the string "snapshot" occuring at 03:08:38:

[05.04.2011 22:57:26] <01> Info Starting job mode: 'Normal'
…
[05.04.2011 23:03:34] <14> Info [Soap] Logout from "https://10.60.65.110:443/sdk"
[06.04.2011 02:52:01] <08> Info [AP] (Client) output: >\n
…
[06.04.2011 03:08:38] <26> Info [AP] (Client) output: snapshot.action = "keep"\n

it would seem that the job simply went on vacation for close to four hours and did nothing ... very confusing and frustrating since that's four hours of "prime time" for performing replication in order to minimize issues to the production environment during business hours

at the end of the job, the log shows the job steaming along and right through the bounce of the host that provides, via, nfs it's target datastore ... the usual task and job progess % entries ... i don't see anything that indicates the job was aware of the bounce and nothing that indicates it caused it any problems.

this is, in a word, strange ... any thoughts?

-p

Post by **Vitaliy S.** » Apr 12, 2011 8:19 am this post

I agree, that's hard to find the reason for this behaviour based on this info. I would recommend sending all those logs to our technical team for further investigation. By the way, can you reproduce this behaviour?

R&D Forums

strange replication behavior

Re: strange replication behavior

Re: strange replication behavior

Re: strange replication behavior

Who is online