by vertices » Wed Feb 22, 2012 4:15 pm people like this post
We have been trying for weeks to get this to work. Everything was fine with Veeam5. Upgraded to Veeam6 and nothing but nightmares. So far have got nowhere with my support case (ID#5172531) so posting here in hopes anyone has any suggestions. All ESX hosts are ESXi 4.1. Veeam 6 is patched with latest patch as of a couple days ago.
We have 2 sites with a 20mbps pipe between them, I'll simply call them source and destination. Source site has 3 ESX hosts connected to a SAN. Also has a physical Veeam server that only performs backups. All servers are joined to the same Windows domain and all are using the same domain admin credentials. This physical Veeam server has no problem backing up the VMs in the source site. It backs them up between 5:30pm and 8:00pm. Everything is perfect with this.
In the destination site we have another physical Veeam server with 6 core, 16GB of RAM, nice system, and there is a single ESXi host connected to a small iSCSI SAN. The Veeam server has a fresh install of Veeam 6 on it (not upgraded from 5). We also have a fresh dedicated 2008R2 VM at the source site I'll call prox1. The replication jobs on the Veeam server in the destination site are configured to use prox1 as the source proxy, and the local server as the destination proxy. I have it set to use existing replicas as a seed and the automap just fine. Everything is left to automatic, other than which proxies to use.
So 2 Veeam servers, identical configs. One is for backing up in the source site and works fine, the other is for pulling replicas to the destination site which doesn't work at all.
We are plagued with nondescript failures such as "An existing connection was forcibly closed by the host" and "An established connection was aborted by the software in your host machine".
I watched the jobs run one night and saw some things. Almost always when I see a failure with "An established connection was aborted by the software in your host machine" it fails at the snapshot removal point, either during or just after. I also sometimes see an error in VMware about "Unable to access file <unspecified filename> since it is locked" in regards to a snapshot removal. However using the same credentials that Veeam is configured with, I can manually remove it just fine.
I have done everything support has wanted me to do, recreating jobs, deleting all vbk files in the replica repository. Nothing seems to improve the situation at all. I am strongly considering deeming Veeam6 not ready for production and going back to 5 which worked just fine, albeit not very efficient.
Does anyone have replication working or any suggestions for us? So far we have had our case open for 9 days, and no DR site for 14 days. I can't let this go on much longer without dropping down to Veeam5 to fix things.
By the way my case# is 5172531in case anyone from Veeam chimes in and wants to take a look.