Hi!
We have an environment with two CAS 2010 servers, load balanced with Windows NLB.
Everything is running on esx4.1.
When the Veeam backup runs it seems to freeze the current CAS for a while (is it standard VMware snapshot being done?) causing it to failover to the other CAS. This takes some time before the NLB sorts it out and the CAS-service is up again. In practice it means we have a mail outage for 10 min every night...
What is the best practice to backup such CAS setup?
Thanks in advance
-
- Novice
- Posts: 3
- Liked: never
- Joined: Feb 03, 2012 8:16 am
- Full Name: Jonas Carlsson
- Contact:
-
- Veteran
- Posts: 1531
- Liked: 226 times
- Joined: Jul 21, 2010 9:47 am
- Full Name: Chris Dearden
- Contact:
Re: Best practice for CAS NLB Exchange 2010
can you force the behaviour by taking a snapshot manually ?
-
- Novice
- Posts: 3
- Liked: never
- Joined: Feb 03, 2012 8:16 am
- Full Name: Jonas Carlsson
- Contact:
Re: Best practice for CAS NLB Exchange 2010
Hi!
Yes, the same behaviour occurs when doing a snapshot manually.
Yes, the same behaviour occurs when doing a snapshot manually.
-
- VP, Product Management
- Posts: 27377
- Liked: 2800 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Best practice for CAS NLB Exchange 2010
Jonas, unfortunately, I'm not that familiar with Exchange CAS servers, but if there is any option to extend this timeout (for keeping CAS connection alive between the nodes), try to use and see if that helps.
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Best practice for CAS NLB Exchange 2010
Hi stelben,
thanks for your enquiry.
Think there are 2 problems:
1. If you delete a snapshot the VM freezes
2. Because of the VM freeze ... you have problems with the Windows NLB Cluster heartbeat
So this is no Exchange Problem and as you said, it also happens if you do this manual so it is no Veeam Problem, too. It is a infrastruktur problem.
Solution for Problem 1:
For Problem 2 if problem 1 can not be solved:
Extend the heartbeat timeout
http://technet.microsoft.com/en-us/libr ... S.10).aspx
http://technet.microsoft.com/de-de/libr ... S.10).aspx
Keyword: "AliveMsgTolerance"
In my life before Veeam I saw a lot of Problems with the NLB Unicast Mode. If you use it I recommend to change it to IGMP Multicast together with your network spezialist, because you have to do some changes in your network for that.
Windows NLB is maybe not the best way to cluster CAS Server because NLB is not service (Exchange) aware. It only cares for the network, and not for Exchange CAS is running behind it or not.
A hardware load balancer cares also about the service availability.
Let me say again, that this is a infrastruktur problem not a Veeam Backup & Replication Software Problem. Veeam uses standard VMware Snapshots for the backup. If these Snapshots don´t work, I recommend to analyse this together with VMware, your Storage Vendor and your Infrastruktur service contractor.
Hope this information can help you to fix your problem.
CU Andy
thanks for your enquiry.
Think there are 2 problems:
1. If you delete a snapshot the VM freezes
2. Because of the VM freeze ... you have problems with the Windows NLB Cluster heartbeat
So this is no Exchange Problem and as you said, it also happens if you do this manual so it is no Veeam Problem, too. It is a infrastruktur problem.
Solution for Problem 1:
@all with snapshot freeze problems.
@all with DAG cluster pans
NFS Datastores => Install VMware fixes (symtom: snapshot freezes at snapshot delete)
SAN Datastores => Install latest VMware Versions and check your HBA/datastore access profile if it suites your SAN Storage (Dedicated/Rounrobin/...)
iSCSI Datastores => Install latest VMware Versions and check your HBA/datastore access profile if it suites your SAN Storage (Dedicated/Rounrobin/...) + Use a dedicated enterprise switch for iSCSI VMware traffic
Update your SAN/iSCSI/NAS Firmware (in case of VMware snapshot commit/delete VMware writes a large amount of random writes) I saw a lot of old firmwares that have problems with that.
Do you use Disk System based sync mirroring?
To check if this is the problem: Disable Storage System syncron mirroring (I saw some systems that perform not well beacues of firmware bugs)
To check out if your Disk/network environment have problems, you can use local disks to check this out.(Storage vmotion of all Volumes)
And use NTP Servers for time sync on each VMware host and VM:
http://kb.vmware.com/selfservice/micros ... nalId=1318
For Problem 2 if problem 1 can not be solved:
Extend the heartbeat timeout
http://technet.microsoft.com/en-us/libr ... S.10).aspx
You can find the entry here:NLB assumes that a host is functioning normally within a cluster as long as it participates in the normal exchange of heartbeat messages between it and the other hosts. If the other hosts do not receive a message from a host for several periods of heartbeat exchange, they initiate convergence. The number of missed messages required to initiate convergence is set to five by default (but can be changed).
http://technet.microsoft.com/de-de/libr ... S.10).aspx
Keyword: "AliveMsgTolerance"
In my life before Veeam I saw a lot of Problems with the NLB Unicast Mode. If you use it I recommend to change it to IGMP Multicast together with your network spezialist, because you have to do some changes in your network for that.
Windows NLB is maybe not the best way to cluster CAS Server because NLB is not service (Exchange) aware. It only cares for the network, and not for Exchange CAS is running behind it or not.
A hardware load balancer cares also about the service availability.
Let me say again, that this is a infrastruktur problem not a Veeam Backup & Replication Software Problem. Veeam uses standard VMware Snapshots for the backup. If these Snapshots don´t work, I recommend to analyse this together with VMware, your Storage Vendor and your Infrastruktur service contractor.
Hope this information can help you to fix your problem.
CU Andy
Who is online
Users browsing this forum: Semrush [Bot] and 13 guests