Discussions specific to the VMware vSphere hypervisor
Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

tsightler wrote:Microsoft lays out very specific requirements for VSS Exchange backups (2007 requirements are documented at http://technet.microsoft.com/en-us/libr ... 80%29.aspx). I think Veeam would not meet the requirements mainly due to the lack of a full integrity check prior to purging logs (although you could script an integrity check with SureBackup).
Hi Tom, just a small correction. While integrity check is in fact mandatory, performing it before purging the logs is not a requirement. Integrity check can be done later (with SureBackup, just as you stated). You just cannot rely on (meaning, attempt to restore) backup before such integrity check is performed on its content.
The backup application must validate the integrity of the shadow copy backup set. Microsoft recommends, but does not require, that this be done before the backup application notifies Exchange that backup has completed...

If a backup application postpones integrity verification until after these tasks have completed ... the backup should not be relied on until the backup application has actually completed integrity verification.
And, of course, the recommendation to perform integrity check before purging the logs does make perfect sense, because SureBackup did not exist until October 2010, so previously there was no reasonable way of performing integrity check of application data inside backup file for each backup you create.

vmmatty
Enthusiast
Posts: 28
Liked: never
Joined: Jun 05, 2009 2:20 pm
Full Name: Matt
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by vmmatty »

Just to clarify what Tom was saying - I'm actually not referring to backups that are integrated with VSS at this point. When I spoke with Microsoft they stated that virtual machine snapshots are unsupported regardless of whether or not Exchange aware VSS is involved. So while I understand Microsoft's strict requirements about VSS and backup of Exchange, Microsoft is saying they won't support the virtual machine snapshot in either case.

I don't think many of us agree with their support stance here but it is what it is. And for what it's worth I've been using products like Veeam and vRanger to backup Exchange 2007 for years without issue.

depps
Influencer
Posts: 20
Liked: never
Joined: Jan 24, 2011 10:16 pm
Full Name: Daniel Epps
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by depps »

So it seems im in the same boat.

E2010 SP1 DAG that we snapshot throughout the day and quite often the 1 second "freeze" during snapshot makes the DAG failover.

Has anyone made an progress with sorting this out.

I know you can adjust the failover timeouts so the DAG doesnt failover so quickly but im hesitant to do this as messaging HA is critical.

Has anyone approached Veeam with this and heard anything back?

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

Right now adjusting timeouts as described on the previous page seems to be the only working resolution...
depps wrote:Has anyone approached Veeam with this and heard anything back?
Hmm, "Veeam" has been replying since the very 1st page of this topic... look for green user names :wink:

Really there is nothing we can do but hope that Microsoft will enhance failure detection in the next service pack for Exchange 2010, or hope that VMware will further improve snapshot handling. Based on my earlier points, I feel this is more DAG design issue than anything else, and VMware snapshot just one of many ways to make the issue to surface.

joergr
Expert
Posts: 386
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by joergr »

Hi,
as a workaround (but NOT a good one, i admit) you can set ceratin cluster apps to 'persistant mode', so they won´t failover during virtual snapshot creation at all. BUT yeah you guess it, auto failover will probably not occur ;-) - but manual would be fine. And this is for all MSCS Apps, not just Exchange.
best regards,
Joerg

vmmatty
Enthusiast
Posts: 28
Liked: never
Joined: Jun 05, 2009 2:20 pm
Full Name: Matt
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by vmmatty »

Joergr

Are you talking about using Independent disks in vSphere? What would be the point in doing that since Veeam wouldn't be able to actually backup the data on the independent disks?

Though that is similar to setups that use in guest iSCSI initiators to connect to data volumes for Exchange. In those configurations I've used Veeam to backup the C: drive (VMDK) but the actual Exchange data is backed up through other tools like Backup Exec.

You made another important point though - this isn't an Exchange/DAG issue, it's really the sensitivity of Windows clusters in general. This could happen on Exchange, SQL, file, print, etc. They are that sensitive by design and I don't think that Microsoft should change them. I guess we are surprised that Veeam didn't come across this in their testing or say anything about it in the release notes. Now that version 5 is out with the ability to do Exchange level backup/restore, I would think that DAGs would have been tested and this issue would have been seen.

All of this being said - I have many clients using Exchange 2010 with DAGs and using Veeam to back them up, and I've only seen this issue consistently at one of them. I can make it happen in my lab if I try hard but it isn't common. And adjusting the cluster sensitivity greatly reduces the incidence of this happening at all.

joergr
Expert
Posts: 386
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by joergr »

Mattyg30,

no i am not talking about using independant disks in vSphere. How come you see that point ;-) ?

Now to your second point: This has nothing to do with veeam - it´s simply and trivial the way virtual snapshots are working (vmware, xen, hyperv, parallels, blablabla - everything the same - comitting a snapshot causes a very short freeze - this is the way it is and we all have to deal with it).

best regards,
Joerg

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

mattyg30 wrote:I guess we are surprised that Veeam didn't come across this in their testing or say anything about it in the release notes.
Don't be so surprised, as you said yourself, even in your case the issue is rare and only affects a single customer.

By the way, generally speaking Veeam could never backup regular Windows clusters, because VMware does not support snapshotting virtual machines with SCSI controller engaged in bus-sharing.

vmmatty
Enthusiast
Posts: 28
Liked: never
Joined: Jun 05, 2009 2:20 pm
Full Name: Matt
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by vmmatty »

I guess the better question is the following: Are you recommending that customers who have virtualized Exchange servers in a DAG not use Veeam to back them up due to this issue? I realize this isn't specifically a Veeam issue but it does affect the product so I think we're all looking for a little guidance here.

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

We do recommend virtualizing 100% of your servers, and backing up 100% of your VMs with Veeam. The issue above is rare, and has workaround available, and does not really cause much issues even when it happens (failover is transparent to end users). Based on these facts, and scope of the issue, why would we recommend ALL of our customers not to do something most of them have been doing with great success?

BTW, I just realized that this issue is likely connected to similar issues when large snapshot removal is causing various applications timeout (there are a few other topics about this around here). And despite most common reasons that are typically causing longer VM stun during snapshot removal are infrastructure problems (storage issues, lack of CPU power on host), I know of at least 2 different bugs in vSphere ESX 4 that will also cause this 100% of times under certain circumstances. So, another way to resolve this is to lookup VM log files and verify that VM stun time does not exceed more than 2 sec (I believe this is longest I could achieve with my personal stress testing of ESX4 back when it was released). If you are seeing much longer stuns in VM logs, than this is the real issue that must be fixed - and once this is done, DAG failover issue should go away as well.

vmmatty
Enthusiast
Posts: 28
Liked: never
Joined: Jun 05, 2009 2:20 pm
Full Name: Matt
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by vmmatty »

I agree with you that failover is transparent to end users in most cases thanks to the architecture changes to Exchange 2010. That said I don't think it's fair to say that the failover itself is a non event or doesn't potentially cause issues. It's entirely possible that you could run into issues when a database fails over without it being done properly and manually, such as a database in a dirty shutdown state. And depending on your design it's possible the failover could cause the active database to move to an Exchange server in another site which may cause other issues.

Interesting about the stun times, that might be something to look into for the one client I have where the DAG fails over frequently during backups.

As I said earlier I have many clients who are not experiencing this issue so I am not overly concerned here. I'm more just looking for Veeam's experience/guidance on this issue since it is real and I'm not the only one experiencing it. Who knows, maybe VMware will make this a non-issue with future improvements to VAAI to reduce the amount of time it takes for stunning VMs during snapshots.

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

Right. I am under NDA with VMware right now (signed with my blood, so I am not braking it), but as soon as they remove NDA for the next release, I will be able to add some comments to this. :)

joergr
Expert
Posts: 386
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by joergr »

I am also under certain NDA´s and will be grilled if i tell any details but what i can tell you is that at least one storage vendor i am aware of is just working on a functionality to offload parts of that "problem" to the hardware, so stay tuned ;-)

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

Ah, this sounds interesting. :D

BTW, I just came back from VMware Partner Exchange conference, attended good session about Exchange 2010 backup and disaster recovery. Imagine this, VMware speaker specifically mentioned Veeam at least twice when talking about protecting Exchange with image-level backups. In fact, we were the only 3rd party image level backup vendor specifically mentioned. Good stuff, as presenter was VMware architect specializing in deploying virtualized Exchange servers with VMware customers!

I did talk to the presenter after the session, and apparently he had not seen the DAG issue discussed here before with his customers. He agreed that this is likely happens due to extended VM stun on snapshot removal.

Bunce
Expert
Posts: 259
Liked: 8 times
Joined: Sep 18, 2009 9:56 am
Full Name: Andrew
Location: Adelaide, Australia
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Bunce »

Maybe a possible workaround / feature would be for Veeam to implement a pre/post script option for snapshot-removal (or even creation), in addition to pre/post job.

Using this it may be possible to send a command to the cluster to cease (or extend the timeout) on a failover before snapshot removal starts, and then in the post command, re-instate the original cluster settings?

Not sure how flexible the FCS executable is to accept this, or if its been encapsulated in a powershell API, but at worse we could run a remote PSEXEC, particularly if its on a different box..

This pre/post snapshot flexibility could conceivably also be used in other ways..

jeffro01
Lurker
Posts: 1
Liked: never
Joined: May 27, 2011 2:35 pm
Full Name: Jeff
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by jeffro01 »

Do all of you backup all your DAG nodes? I have a two node DAG and are backing up both of them in the same job. Traditionally MS only wants you to backup the passive node but I wanted to get feedback here to see what other people are doing.

I have the same failover problems with the VMWare snapshot and have modified my timeout settings per the articles refrenced here so we shall see.

Thanks.
Jeff

Bunce
Expert
Posts: 259
Liked: 8 times
Joined: Sep 18, 2009 9:56 am
Full Name: Andrew
Location: Adelaide, Australia
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Bunce »

We only backup one regularly since its the same data. might backup the other after any service pack installs etc, but don't see the point of doing both in same job unless you're using ' lagged' copies.

vmmatty
Enthusiast
Posts: 28
Liked: never
Joined: Jun 05, 2009 2:20 pm
Full Name: Matt
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by vmmatty »

I think this is more of an Exchange architecture question than a Veeam question. In most of my Exchange 2010 designs there is no true "passive" copy of the data. That is, each node in the DAG is running an active copy of the database and contains a passive copy from the other node(s). So in that scenario they would likely backup all DAG nodes to get all of the active data. Again many of my clients are hesitant about using Veeam to backup Exchange because of the Microsoft support statement (discussed at the beginning of this thread).

Regarding the cluster timeout settings and failover, adjusting those settings should put the issue to bed. In fact now that Microsoft officially supports using vMotion with DAG nodes they are suggesting increasing the cluster timeout to a max of 10 seconds anyway. That should be more than enough for the brief delay when taking a snapshot, so you can kill two birds with one stone.

adrianmarsh
Novice
Posts: 6
Liked: never
Joined: Jul 15, 2011 12:11 pm
Full Name: adrian marsh

Veeam and DAG clustering

Post by adrianmarsh »

[merged]

Hi All,

Recently we had a new VMware/Exchange and Veeam system setup.

The Exchange DAGs (both are VMs) are setup in some form of Cluster (but I'm not an Exchange expert here - i know it uses some Clustering technology).
What I notice, is that when i come into work in the morning, my Outlook client is prompting for username/password. This means that connectivity to the Mailstore dropped at somepoint overnight and outlook has switched to the https method of connecting.

When I check the event logs of each DAG, I see that there are messages about Cluster failovers. Looking back in the Event history, it always seems that failovers are logged at the same time as when a Veeam backup of the Dags has run.

I know that Veeam backups up VM's by telling ESX to create a snapshot, and then releases the snapshot, and I think its the snapshot "pause" that causes the Cluster heartbeat to fail for a few miliseconds, and that causes the DAG switch.

Has anyone else seen this ? And if so - any recommendations ?

Thanks,

Adrian

MattG
Enthusiast
Posts: 39
Liked: never
Joined: Dec 22, 2010 3:50 pm
Full Name: MattG
Location: Philadelphia, PA
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by MattG »

Is Veeam working on a work around for this issue? Since Veeam currently only uses VMware snapshots and VMware snapshots are causing the Exchange 2010 DAG to failover we currently need to invest in an agent backup program to backup Exchange (everything else works fine with Veeam).

Last night not only did Exchange 2010 failover when being backed up by Veeam, but it caused the Exchange services to get corrupted.

Question is, will vSphere 5 resolve this issue or does Veeam 6 offer an agent that could be installed to work around the Exchange 2010 DAG issue.

Thanks,
-MattG
Twitter: http://twitter.com/#!/matthewgraci

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Gostev »

Hi Matt,

We have no plans of implementing agent-based backups as a part of Veeam Backup&Replication. We are however looking at implementing technologies in the short term (but after v6) to reduce negative effect of VMware snapshots, or even (possibly) to completely eliminate VM snapshot usage for VMware backup. For example, our upcoming Hyper-V backup will not use VM snapshots, but rather VSS snapshots, which enable instant snapshot "commit".

Thanks.

MattG
Enthusiast
Posts: 39
Liked: never
Joined: Dec 22, 2010 3:50 pm
Full Name: MattG
Location: Philadelphia, PA
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by MattG »

Are there any workarounds for this? Is it possible to script the Exchange 2010 DAG to break the cluster, then backup, then connect the cluster again?

Any improvements on vSphere 5?

Thanks,
-MattG
Twitter: http://twitter.com/#!/matthewgraci

Bunce
Expert
Posts: 259
Liked: 8 times
Joined: Sep 18, 2009 9:56 am
Full Name: Andrew
Location: Adelaide, Australia
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by Bunce »

The workarounds are listed in this post above

link12
Influencer
Posts: 16
Liked: never
Joined: Jun 20, 2012 12:53 pm
Full Name: Dan Phillips

Issues with Veeam and Exchange 2010 DAG cluster failover.

Post by link12 »

[merged]

After applying the following settings we are still having issues when Veeam is committing the snapshots.

cluster /prop SameSubnetDelay=2000:DWORD
cluster /prop CrossSubnetDelay=4000:DWORD
cluster /prop CrossSubnetThreshold=10:DWORD
cluster /prop SameSubnetThreshold=10:DWORD

We're losing about 4 pings which then causes our cluster to failover. The DAG operates fine when taking and committing a regular VMware snapshot, it's only Veeam that we're noticing issues. We currently only have 10 live mailboxes on our 4 node single DAG cluster. We're waiting to solve this issue before migrating the rest of the users. What would cause such a long ping loss? Any help would be greatly appreciated.

We are Exchange 2010 SP2, Veeam 6.1, vSphere 5.x

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issues with Veeam and Exchange 2010 DAG cluster failover

Post by Gostev »

Veeam takes regular VM snapshots as well. There is no way to do them differently - there is a single set of vSphere API calls both vSphere Client and Veeam are using to create and remove snapshots.

Veeam itself does not "commit snapshots" either, we only pass the request to do so to VMware (using the same API call as vSphere Client uses when you use it to remove snapshot).

If you just create and remove snapshot with vSphere client, it will not get a chance to grow big, so commit will be very fast. Keep the snapshot for a few hours before removing it - and you should see the same behavior during commit as after Veeam backup.

link12
Influencer
Posts: 16
Liked: never
Joined: Jun 20, 2012 12:53 pm
Full Name: Dan Phillips

Re: Issues with Veeam and Exchange 2010 DAG cluster failover

Post by link12 »

I'll give that a shot and let the VM snap run for a couple of hours. What would cause such a long commit and network loss however? These servers are not yet in production so the amount of delta's is very small, if any.

Gostev
SVP, Product Management
Posts: 26700
Liked: 4276 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Issues with Veeam and Exchange 2010 DAG cluster failover

Post by Gostev »

In that case, the most likely reason is some storage issues. VMware stuns the VM for final delta commit, if that operation takes longer than normal (seconds), you will see the loss of ping (not a network loss, which is different thing - ping loss is fine, network connection drop is not).

Review VM logs in VMware for your VM stun durations, they are clearly labeled there.

link12
Influencer
Posts: 16
Liked: never
Joined: Jun 20, 2012 12:53 pm
Full Name: Dan Phillips

Re: Issue with VMware & Exchange 2010 DAG

Post by link12 »

Just an update on this. I was able to recreate the cluster issue inside VMware alone. Probably the wrong forum, but........for example we have 1 mailbox server with 4 ESX Luns assigned to it, 4 1TB disks. Would that be the problem for taking a long time for both taking a snapshot as well as committing. Can't find any tweeks anywhere.

eiskra
Enthusiast
Posts: 25
Liked: never
Joined: Mar 07, 2012 11:54 pm
Full Name: Edward Iskra
Contact:

Re: Issue with VMware & Exchange 2010 DAG

Post by eiskra »

You should also check when Exchange is doing regular maintenance, which can cause a lot of changes on disk even though little or no email is flowing.

From a random web hit:
Exchange Server 2010 automatically performs database maintenance procedures on a nightly basis during the scheduled maintenance window. Exchange Server 2010 performs two distinct activities: Online Maintenance (OLM) and Online Defragmentation (OLD). OLM starts by default at 1:00 a.m. every day, whereas OLD is continuous.

Note: This is different than in Exchange Server 2007, in which OLD ran during the OLM process. This led to a fragmented database during operations with the attendant drop in performance.

eiskra
Enthusiast
Posts: 25
Liked: never
Joined: Mar 07, 2012 11:54 pm
Full Name: Edward Iskra
Contact:

Optimizing Exchange 2010 backups

Post by eiskra »

[merged]

I'm tyring to improve performance on our Exchange backups - I'd like them to be quicker and smaller, but my main concern is that I get occasional failures or warnings, generally because of snapshots/freeze problem, which also mean the logs don't get truncated every time.

We have two servers running Exchange 2010; one server provides DAG for the other, for reliability. We've split the workload by having one server be the primary on which the mailboxes are mounted, while the other is the primary for other roles, including OWA. In the event of failre, either server can take over.

We've already done the basics, like adjust the timers so that the stun during a snapshot does not cause DAG failover (see other threads.)
We've also scheduled the backups so they do not coincide with peak use, and with nightly Exchange maintanance, both of which cause a lot of changed sectors on the drive, which increase the snapshot delta and therefore increase the snapshot commit time at the end of the backup.

Are there any other recommendations out there for optimizing the speed and reliability of the backups in an environment like ours?

Post Reply

Who is online

Users browsing this forum: vShawn and 23 guests