Discussions specific to the VMware vSphere hypervisor
Post Reply
stelben
Novice
Posts: 3
Liked: never
Joined: Feb 03, 2012 8:16 am
Full Name: Jonas Carlsson
Contact:

Best practice for CAS NLB Exchange 2010

Post by stelben »

Hi!

We have an environment with two CAS 2010 servers, load balanced with Windows NLB.
Everything is running on esx4.1.
When the Veeam backup runs it seems to freeze the current CAS for a while (is it standard VMware snapshot being done?) causing it to failover to the other CAS. This takes some time before the NLB sorts it out and the CAS-service is up again. In practice it means we have a mail outage for 10 min every night...

What is the best practice to backup such CAS setup?

Thanks in advance

chrisdearden
Expert
Posts: 1530
Liked: 225 times
Joined: Jul 21, 2010 9:47 am
Full Name: Chris Dearden
Contact:

Re: Best practice for CAS NLB Exchange 2010

Post by chrisdearden »

can you force the behaviour by taking a snapshot manually ?

stelben
Novice
Posts: 3
Liked: never
Joined: Feb 03, 2012 8:16 am
Full Name: Jonas Carlsson
Contact:

Re: Best practice for CAS NLB Exchange 2010

Post by stelben »

Hi!

Yes, the same behaviour occurs when doing a snapshot manually.

Vitaliy S.
Product Manager
Posts: 24256
Liked: 1863 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Best practice for CAS NLB Exchange 2010

Post by Vitaliy S. »

Jonas, unfortunately, I'm not that familiar with Exchange CAS servers, but if there is any option to extend this timeout (for keeping CAS connection alive between the nodes), try to use and see if that helps.

Andreas Neufert
VP, Product Management
Posts: 4515
Liked: 855 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Best practice for CAS NLB Exchange 2010

Post by Andreas Neufert »

Hi stelben,

thanks for your enquiry.
Think there are 2 problems:
1. If you delete a snapshot the VM freezes
2. Because of the VM freeze ... you have problems with the Windows NLB Cluster heartbeat

So this is no Exchange Problem and as you said, it also happens if you do this manual so it is no Veeam Problem, too. It is a infrastruktur problem.

Solution for Problem 1:
@all with snapshot freeze problems.
@all with DAG cluster pans

NFS Datastores => Install VMware fixes (symtom: snapshot freezes at snapshot delete)
SAN Datastores => Install latest VMware Versions and check your HBA/datastore access profile if it suites your SAN Storage (Dedicated/Rounrobin/...)
iSCSI Datastores => Install latest VMware Versions and check your HBA/datastore access profile if it suites your SAN Storage (Dedicated/Rounrobin/...) + Use a dedicated enterprise switch for iSCSI VMware traffic
Update your SAN/iSCSI/NAS Firmware (in case of VMware snapshot commit/delete VMware writes a large amount of random writes) I saw a lot of old firmwares that have problems with that.


Do you use Disk System based sync mirroring?
To check if this is the problem: Disable Storage System syncron mirroring (I saw some systems that perform not well beacues of firmware bugs)

To check out if your Disk/network environment have problems, you can use local disks to check this out.(Storage vmotion of all Volumes)

And use NTP Servers for time sync on each VMware host and VM:
http://kb.vmware.com/selfservice/micros ... nalId=1318

For Problem 2 if problem 1 can not be solved:
Extend the heartbeat timeout

http://technet.microsoft.com/en-us/libr ... S.10).aspx
NLB assumes that a host is functioning normally within a cluster as long as it participates in the normal exchange of heartbeat messages between it and the other hosts. If the other hosts do not receive a message from a host for several periods of heartbeat exchange, they initiate convergence. The number of missed messages required to initiate convergence is set to five by default (but can be changed).
You can find the entry here:
http://technet.microsoft.com/de-de/libr ... S.10).aspx
Keyword: "AliveMsgTolerance"

In my life before Veeam I saw a lot of Problems with the NLB Unicast Mode. If you use it I recommend to change it to IGMP Multicast together with your network spezialist, because you have to do some changes in your network for that.

Windows NLB is maybe not the best way to cluster CAS Server because NLB is not service (Exchange) aware. It only cares for the network, and not for Exchange CAS is running behind it or not.
A hardware load balancer cares also about the service availability.

Let me say again, that this is a infrastruktur problem not a Veeam Backup & Replication Software Problem. Veeam uses standard VMware Snapshots for the backup. If these Snapshots don´t work, I recommend to analyse this together with VMware, your Storage Vendor and your Infrastruktur service contractor.

Hope this information can help you to fix your problem.

CU Andy

Post Reply

Who is online

Users browsing this forum: Google [Bot], KFlynn and 28 guests