Data loss bug - Instant VM Recovery with redirected writes

VMware specific discussions

Data loss bug - Instant VM Recovery with redirected writes

Veeam Logoby pvz » Mon Nov 17, 2014 9:09 am

I'd just like to warn users out there for a bug in Veeam Backup & Replication which will eat your production data in a specific restore scenario if you are not careful.

I have a ticket open for this case (00674402) and I'm taking this to the forums, because, first of all, I feel information about the bug needs to come out, so that people can work around it and avoid data loss in a restore scenario.

To trigger this bug you have to:

1. Perform an Instant VM Recovery of your virtual machine. When configuring the Instant VM Recovery job, enable redirection of disk updates to the datastore you're planning on migrating the production VM to.

2. Power on the VM. At this point, the machine will be "live" and accepting user data. The machine will also not be covered by any backup, unless you take steps to ensure that it will.

3. At an appropriate time, perform a "Migrate to production" on the VM. Choose the same datastore as in step 1. (Actually, I'm not sure if the two datastores need to be the same, in my tests, I have not tested to use two different production datastores). Ensure that VMware Storage VMotion is used (not Quick MIgration). There is a check box at the end called "Delete source VM files upon successful quick migration (does not apply to vMotion)". Set this checkbox however, you like, it makes no difference.

4. Kiss your production data goodbye. Any data that has been written between steps 2 and steps 3, which could potentially be several hours or even days waiting for an appropriate service window - gone. You'll of course still have your original backup that you spun up the Instant VM Recovery from, but anything after that, irretrievable. What happens is that Veeam triggers a VMotion of the machine. For some reason, perhaps because the redo logs are already on the destination datastore, it decides that the Storage VMotion is done after only a few seconds, even though the data is still on the vPower NFS datastore. At this point, Veeam decides to DELETE your instant recovery VM because the Instant Recovery job is "done". That is most definitely not the desired behaviour for anyone in any scenario.

Right now, I'm a bit wary to offer workarounds, I suggest that anybody planning on using this feature tests it out in their environment, and makes sure anybody in their organization who might do an Instant VM Restore knows about this bug, until such time that Veeam releases a patch for this. Some possible workarounds (again, I take no responsibility for these, you will have to test this yourself to see if it works in your environment):

1. Don't use Instant VM Recovery, instead do a regular VM restore.
2. If you have to use Instant VM Recovery, do not redirect virtual disk updates.
3. If you have to redirect virtual disk updates - try using a different datastore for your migration destination and your disk updates. (UNTESTED)
4. If you have to put your virtual disk updates on the same datastore you plan on migrating to, use Veeam Quick Migration rather than Storage vMotion.

If your VMware installation doesn't have a license for Storage vMotion, you will not be bitten by this bug, because it only happens when using Storage vMotion.

Now, for me, I didn't experience any real data loss, because I happened to find this bug when I was demoing the software to a colleague who was preparing some documentation for VM recovery prodecures in our organization, so all I lost was some test data. But it might as well have been real data loss.

Still, I'm disappointed that Veeam Support has not been taking this bug seriously. The last response I got from Veeam is this:

As we discussed with engineers it is not actually a bug from Veeam side, it is more by design behavior. Because all steps from Veeam were done correctly according to the settings set for the jobs. We thing about warning message to notify user about consequences of these steps. In the next patches of Veeam we are gonna to add this notification.

In other words: Veeam will eat your data. By design. :roll:

Is it just me having too high expectations, or does anyone else find this kind of stance about this kind of bug... strange? Makes me wonder what other "design behaviours" are lurking below...
pvz
Novice
 
Posts: 4
Liked: never
Joined: Sat May 28, 2011 10:12 am
Full Name: Per von Zweigbergk

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Gostev » Mon Nov 17, 2014 1:50 pm

Hello. If this bug is real, then I'm also disappointed that this particular Veeam support engineer is not taking it seriously. However, I also find it strange that you are the first user to run into this bug after 5 years of Instant VM Recovery feature existence, and tens of thousands users using it for years for production recoveries. So, let me get more details on this, and what exactly it takes to run into this bug. Thanks!
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby m_zolkin » Mon Nov 17, 2014 2:59 pm

Hi Per,

We are reviewing the conversation between you and the tech representative from the Veeam side. Looks like the BUG has been reported to the support management on 12th Nov, once the technician has reproduced the problem in our environment. I am still trying to understand why it got stuck there, but in the meantime our QA team is working on that.

Once the QA will collect all the necessary data and if they confirm this behavior - we'll get in touch with VMware support and report them the problem.

Once again, please accept our apologies for any inconvenience caused by this incident.
VP, Customer Technical Support, EMEA & APAC
m_zolkin
Veeam Software
 
Posts: 20
Liked: 11 times
Joined: Wed Aug 26, 2009 1:13 pm
Full Name: Mikhail Zolkin

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Gostev » Tue Nov 18, 2014 11:59 am 1 person likes this post

So, this appeared to be a critical bug in VMware Storage VMotion logic.

Steps to reproduce without Veeam in the picture:
1. Create a test VM with virtual disk on Datastore 1.
2. Use workingDir and snapshot.redoNotWithParent VMX parameters to move snapshot files location to Datastore 2.
3. Create a VM snapshot.
4. Perform Storage VMotion from Datastore 1 to Datastore 2.
5. Storage VMotion operation reports success, however no VM files are actually moved anywhere.

Redirecting snapshot to the same datastore that will be the Storage VMotion target is the requirement for the issue to trigger. If you use any other datastore, you will not run into this bug.

We will notify VMware about the issue.

To work around the issue, select "Force Veeam quick migration" checkbox when migrating instantly recovered VM to the production storage.
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby m_zolkin » Tue Nov 18, 2014 12:07 pm

Gostev wrote:However, I also find it strange that you are the first user to run into this bug after 5 years of Instant VM Recovery feature existence, and tens of thousands users using it for years for production recoveries. So, let me get more details on this, and what exactly it takes to run into this bug. Thanks!

It turned our that only vSphere 5.5 is affected, the scenario worked fine for vSphere 5.1. That explains why we didn't see such issues before.
We submitted a ticket with VMware SDK support as well as Veeam works on the solution to bypass the issue.
VP, Customer Technical Support, EMEA & APAC
m_zolkin
Veeam Software
 
Posts: 20
Liked: 11 times
Joined: Wed Aug 26, 2009 1:13 pm
Full Name: Mikhail Zolkin

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby pvz » Tue Nov 18, 2014 2:06 pm

I would suggest to add a check to Veeam to see that the files are *actually* on the target datastore with no dependencies left to the vPower NFS datastore before nuking the datastore, despite VMware reporting success. That should protect any of your customers who might be running this in the future on what is now the current version of vSphere.

Paranoia is never a bad policy when it comes to a backup product. :-)
pvz
Novice
 
Posts: 4
Liked: never
Joined: Sat May 28, 2011 10:12 am
Full Name: Per von Zweigbergk

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Gostev » Tue Nov 18, 2014 4:23 pm

For a quick fix to include in Patch #1, we will probably just force the usage of native quick migration engine when detecting such setup, instead of relying on VMware Storage VMotion.
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby dutch123 » Mon Nov 24, 2014 8:53 am

Will this also affect Veeam v7 in combination with vSphere 5.5?
dutch123
Lurker
 
Posts: 2
Liked: never
Joined: Thu Dec 27, 2012 12:13 pm

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Vitaliy S. » Mon Nov 24, 2014 8:57 am

It doesn't matter which version of Veeam B&R is used, since the issue sits in the Storage VMotion engine of vSphere 5.5.
Vitaliy S.
Veeam Software
 
Posts: 19568
Liked: 1104 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Meyercord » Mon Dec 01, 2014 5:16 pm

Has VMware acknowledged this bug in their product?
Meyercord
Enthusiast
 
Posts: 35
Liked: 6 times
Joined: Mon Jul 14, 2014 4:31 pm
Full Name: AJ Meyercord

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby Gostev » Tue Dec 02, 2014 12:17 am

Our support orgs are working with each other, last I heard is that our engineer was able to reproduce the issue for them, and they have collected all required logs.
Gostev
Veeam Software
 
Posts: 21396
Liked: 2350 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby dnrc » Thu Nov 05, 2015 12:58 pm

Hi, is this bug something i should still be concerned about?

i have done exactly the process described here today to restore an exchange server.

now i need to migrate to production but having read this I am now concerned about doing so.

running veeam 8.0.0.204
esxi 5.5.0 1331820
vcenter 5.5.0 2442329

what can i do to keep the data safe?

reading above it seems to read that using a quick migration will be ok, i just want to confirm if that is the case.

thanks
dnrc
Novice
 
Posts: 9
Liked: never
Joined: Tue Apr 21, 2015 8:19 am
Full Name: Daniel Caine

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby foggy » Thu Nov 05, 2015 2:29 pm

Daniel, right, since the issue is in Storage vMotion engine, using Quick Migration is safe. Actually, the issue was addressed in the first Update for Veeam B&R v8 (Quick Migration is forced in such scenario), however, to be completely on the safe side, you can select the "Force Veeam quick migration" check box and clear the "Delete source VM files upon successful quick migration" one.
foggy
Veeam Software
 
Posts: 14752
Liked: 1083 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby dnrc » Thu Nov 05, 2015 6:11 pm

ok foggy, thanks for that.

that was what i gleaned from the other posts but in this case (production exchange server) i wanted to be sure

i'm still going to take a separate image before doing anything as well though.
dnrc
Novice
 
Posts: 9
Liked: never
Joined: Tue Apr 21, 2015 8:19 am
Full Name: Daniel Caine

Re: Data loss bug - Instant VM Recovery with redirected writ

Veeam Logoby v.Eremin » Mon Nov 09, 2015 8:23 am

VeeamZIP might come in quite handy in this case. Thanks.
v.Eremin
Veeam Software
 
Posts: 13291
Liked: 973 times
Joined: Fri Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin


Return to VMware vSphere



Who is online

Users browsing this forum: Google [Bot] and 11 guests