Veeam vs. vSphere Replication

VMware specific discussions

Re: Veeam vs. vSphere Replication

Veeam Logoby Gostev » Mon Oct 07, 2013 9:52 pm 1 person likes this post

Hi Ken, thanks for taking time to register on our forums to post this.

I see you work for VMware, so I recommend that you just talk to vSphere Replication developers, because this is where I got all my information from. They actually had a great session at VMworld 2012 with lots of technical details about how it works in depth, with lots of pictures and long Q&A afterwards.

The "hidden" snapshot is called LWD in VMware terminology (Light Weight Delta), and it does exist. Again, you don't have to take my word for it, just talk to your devs directly.

You have provided a very long and confusing explanation, so instead of trying to address individual misstatements, I think it will be best for me to approach this from a different angle, which will be easy to understand for everyone even without having to know specifics of the particular implementation.

Here is how I like to explain this. It is impossible to perform asynchronous replication without some sort of snapshot created for the duration of data transfer, because you must have means of protecting a replicated state of the VM image, while replication of that state takes place (which can easily take minutes). And this is not possible without some sort of snapshot even in theory. Simple as that! Now, synchronous replication is the whole other story, blocks are replicated immediately as they are modified, but this is NOT what vSphere Replication does.

As it comes to marketing papers (LinkedIn says you have a marketing role at VMware), it is perfectly acceptable to state there that vSphere Replication does not use snapshots (because most users think VM snapshots when they hear "snapshot"). However, we are not discussing technologies on the marketing level on these forums, but rather a few levels below that ;)

Now, don't get me wrong: no one here says LWD approach is bad. As I've said above, LWD is better than using regular VM snapshots, and is much better than the approach one Microsoft implemented for Hyper-V replica. But every technology has its pros and cons, and it is important for VMware to be very clear about both with the technical audience.
Gostev wrote:PROS: No commit required, snapshot is simply discarded after replication cycle completes.
CONS: While replication runs, there is 3x I/O per each modified block that belongs to the replicated state.

Also, while you are here, do you care to comment why VMware would not open LWD API for the 3rd party vendors to use? As you can see above, your users would like 3rd party vendors like Veeam to be able to leverage this technology.

Here at Veeam, we've put our bets on integrating with storage snapshots, as only this can help to completely eliminate I/O overhead on replicated VMs during the data transfer window... but the obvious CONS of our approach is that it is limited to certain supported storage devices only. And even though we are constantly expanding this list, having access to LWD would enable us to deliver universal engine we could failover to in case of incompatible storage, thus enabling 100% of our joint customers to have better VMware-based data protection strategy.

Thank you!

Anton Gostev
VMware vExpert 2013
Gostev
Veeam Software
 
Posts: 21390
Liked: 2349 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:20 pm

I think Ken is suggesting that it's really an extended fancy version of a synchronous replication.
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

Re: Veeam vs. vSphere Replication

Veeam Logoby vmKen » Mon Oct 07, 2013 10:20 pm 2 people like this post

Yes, I'm one of the guys who presented that session at VMworld 2012 (and 2013), I'm an architect in the product management group who does the technical marketing for VR and SRM. I speak with the developers of the product on a daily basis.
The LWD is not a hidden snapshot, the LWD is a collection of blocks that is created dynamically at time of replication. We create pointers to the blocks as they change, in a memory bitmap and in a file called the PSF file (persistent state file). They are simply pointers that get updated as the blocks change. No snapshot, no intrusion.
When the scheduler determines it is time to replicate we refer to those pointers to know which blocks need to be sent, and read a copy of the blocks at their current state for replication. Those blocks are read into buffers and shipped to the recovery site where a network file copy writes them to a redo log on the recovery site. There is no intrusion or interaction with the production VM at all.
There are two scenarios in which there might be interaction with the objects on the replicated side. 1) If a block that is being replicated at that *exact* moment while we are reading and sending it, we need to protect the block until we are assured it has been written at the recovery location. In that one instance alone we redirect the current write to the persistent state file until the replicated block is written and acknowledge, and then we commit it to the original vmdk. No snapshot takes place, no intrusion to the VM, no stun, nothing, but this may be considered a CoW for *individual* blocks that are changing only while they are being sent and the send has not completed. This does not interact with the VM, or its writes directly, only those rare scenarios where that particular block is changing *during* the replication of it. And even then the VM is unaware of it and not interacted with in any fashion like a snapshot.
The LWD is the bundle of blocks that we treat as a unit for replication, all the blocks that are read for replication at one time.
2) The only other scenario where there *is* an actual snapshot is if the VM is set up to replicate with VSS quiescing and the OS is 2k8/2k12. For those systems the only way Microsoft implements VSS is through snapshots, and even then we do this through an interesting 'forked snapshot' that is very temporary and discarded after replication is complete.
For normal run of the mill operation there is no snapshot. We behave more like a 'delayed synchronous' where we *track* them as they are modified, but ship them as a bundle (the LWD) asynchronously. If we were using a filter in-line to the writes that forked the write, you'd be correct that we'd need to stun the VM in some fashion, but instead we allow every write to take place directly and simply track which blocks are changed, then grab a read of those blocks non-intrusively.
vmKen
Novice
 
Posts: 5
Liked: 2 times
Joined: Mon Oct 07, 2013 7:26 pm
Full Name: Ken Werneburg

Re: Veeam vs. vSphere Replication

Veeam Logoby Gostev » Mon Oct 07, 2013 10:24 pm

averylarry wrote:I think Ken is suggesting that it's really an extended fancy version of a synchronous replication.

I wish it was, but the story falls apart as soon as you copy 2GB ISO to a VM synchronized by vSphere Replication to an offsite location over 1Mbps link ;)
Gostev
Veeam Software
 
Posts: 21390
Liked: 2349 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:30 pm

"read a copy of the blocks at their current state for replication." "There is no intrusion or interaction with the production VM at all."

I do not understand how these 2 statements are not direct contradictions?
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:32 pm

Gostev wrote:
averylarry wrote:I think Ken is suggesting that it's really an extended fancy version of a synchronous replication.

I wish it was, but the story falls apart as soon as you copy 2GB ISO to a VM synchronized by vSphere Replication to an offsite location over 1Mbps link ;)

Not if you have enough RAM as a local buffer, and enough overall bandwidth to eventually catch up. Right? :D
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

Re: Veeam vs. vSphere Replication

Veeam Logoby Gostev » Mon Oct 07, 2013 10:32 pm

Hi Ken, now with the above explanation, it sounds like we are on the same page.

To me, that persistent state file you referenced is a type of "hidden snapshot" in my definition, an entity used to protect the replicated VM state, and what is causing the extra I/O. You are perfectly correct this time, stating that PSF file is a CoW type of storage - and when there is CoW, there is an extra I/O.

Write to CoW, then read from CoW and commit into VMDK gives that 3x I/O per each modified block that belongs to the replicated state, just as I've said above.

Can we agree that we agree with each other? ;)

Thanks!
Gostev
Veeam Software
 
Posts: 21390
Liked: 2349 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Veeam vs. vSphere Replication

Veeam Logoby vmKen » Mon Oct 07, 2013 10:36 pm

Sorry, didn't see the rest of this down below:
Gostev wrote:As it comes to marketing papers (LinkedIn says you have a marketing role at VMware), it is perfectly acceptable to state there that vSphere Replication does not use snapshots (because most users think VM snapshots when they hear "snapshot"). However, we are not discussing technologies on the marketing level on these forums, but rather a few levels below that ;)


When I wrote the technical material you called it confusing. :) Happy to get as detailed as you like, that's my job. I've posted a lot of material on this topic at http://blogs.vmware.com/vsphere/uptime.

Gostev wrote:PROS: No commit required, snapshot is simply discarded after replication cycle completes.
CONS: While replication runs, there is 3x I/O per each modified block that belongs to the replicated state.


Not sure where this 3x I/O is coming from. There is no intercept of the writes, they write out as normal. Each block is read once for replication (as all replication needs to do), and that's it. So it's a single write and a single read for the changed blocks on the protected site. If you're including the write I/O to the redo log and commit at the recovery site, that's a bit unfair as every replication technology needs to do writes at the target...

Gostev wrote:Also, while you are here, do you care to comment why VMware would not open LWD API for the 3rd party vendors to use? As you can see above, your users would like 3rd party vendors like Veeam to be able to leverage this technology.


There is no published API at all, to partners or customers, it's using a fundamental call within the kernel itself, so it's hard to expose that gracefully to the outside world. Trust me, we get beat up on APIs about this all the time, but securing the kernel is important. So we have a few calls we can make to it internally (via CLI) to configure the replication and that's about it. Lots of people want API access and lots of people want expanded CLI. We're always looking at how to do that though! Some other vendors in the world are doing... inappropriate things to gain access to things like the vSCSI filters without an API, and the problem there is if we change anything at all on those internal calls the whole house of cards might come down. People don't like it when their replication stops working for DR. :) So we're looking at potentially writing a published API for this, but since that hasn't been in scope from the start it's something we're going to have to retrofit.

Gostev wrote:Here at Veeam, we've put our bets on integrating with storage snapshots, as only this can help to completely eliminate I/O overhead on replicated VMs during the data transfer window... but the obvious CONS of our approach is that it is limited to certain supported storage devices only. And even though we are constantly expanding this list, having access to LWD would enable us to deliver universal engine we could failover to in case of incompatible storage, thus enabling 100% of our joint customers to have better VMware-based data protection strategy.


Sure, if we get an API developed for this or rolled into the SDK or the like it'll be much easier to layer on top of this, but my goal in coming here was strictly to clear up a few things, not solve all the difficulties of VMware partnership. :)
vmKen
Novice
 
Posts: 5
Liked: 2 times
Joined: Mon Oct 07, 2013 7:26 pm
Full Name: Ken Werneburg

Re: Veeam vs. vSphere Replication

Veeam Logoby Gostev » Mon Oct 07, 2013 10:37 pm

averylarry wrote:Not if you have enough RAM as a local buffer, and enough overall bandwidth to eventually catch up. Right? :D

Yes, albeit there will be a few cycles of missed RPOs, but eventually it should catchup. I simply referenced the most basic test everyone can perform to realize vSphere Replication is not a synchronous replication (or an extended fancy version of).
Gostev
Veeam Software
 
Posts: 21390
Liked: 2349 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:39 pm

Gostev wrote:...
To me, that persistent state file you referenced is a type of "hidden snapshot" in my definition, an entity used to protect the replicated VM state, and what is causing the extra I/O.

...

Ken specifically stated there is no CoW (with an exception). The PSF file is more like a second CBT file, containing only pointers to changed blocks, not the changed data itself.

If I understand him . . .
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

Re: Veeam vs. vSphere Replication

Veeam Logoby vmKen » Mon Oct 07, 2013 10:41 pm

Gostev wrote:Hi Ken, now with the above explanation, it sounds like we are on the same page.
To me, that persistent state file you referenced is a type of "hidden snapshot" in my definition, an entity used to protect the replicated VM state, and what is causing the extra I/O. You are perfectly correct this time, stating that PSF file is a CoW type of storage - and when there is CoW, there is an extra I/O.

Write to CoW, then read from CoW and commit into VMDK gives that 3x I/O per each modified block that belongs to the replicated state, just as I've said above.


Well I guess I'm getting picky - the PSF file isn't a snapshot, it's just a file that in no way looks like a snapshot or interacts with the snapshot tree, that's where I was getting caught up.
And that CoW in the PSF is pretty rare... It's not affecting every write, it's only for changed blocks that 1) have changed while the replication for that VMDK is taking place, 3) have not already been sent from the current LWD, 3) have been sent but not written and acknowledged by the recovery site.
In that case there is a write, a redirect, then a write. So 2 extra writes just for those blocks. How is that different than populating them into a snapshot then committing the snapshot though?
I think we're close to agreement. :)
vmKen
Novice
 
Posts: 5
Liked: 2 times
Joined: Mon Oct 07, 2013 7:26 pm
Full Name: Ken Werneburg

Re: Veeam vs. vSphere Replication

Veeam Logoby vmKen » Mon Oct 07, 2013 10:43 pm

Great conversation, I'll come back to chat more, right now I've got to run. If anyone's in VMworld in Barcelona, come say hi!
-Ken
vmKen
Novice
 
Posts: 5
Liked: 2 times
Joined: Mon Oct 07, 2013 7:26 pm
Full Name: Ken Werneburg

Re: Veeam vs. vSphere Replication

Veeam Logoby Gostev » Mon Oct 07, 2013 10:48 pm

averylarry wrote:Ken specifically stated there is no CoW (with an exception)

OK, however I have always been talking about the exception (modified blocks belonging to the replicated state).

Anyway, I think the latest post from Ken sums it up quite nicely.
This exactly that overhead I/O I was talking about, confirmed:
vmKen wrote:CoW ... for changed blocks that
1) have changed while the replication for that VMDK is taking place,
2) have not already been sent from the current LWD,
3) have been sent but not written and acknowledged by the recovery site.

I think we are all in agreement now (except of the definition of "snapshot", haha).

Thank you both.
Gostev
Veeam Software
 
Posts: 21390
Liked: 2349 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:55 pm

I don't think so. Ken claims that only the very small subset of changed blocks that are changed again during the replication cycle are CoW, where Veeam because it uses a VMware snapshot will CoW ANY and ALL blocks that are changed during the replication cycle.

If I understand. So Veeam using a VMware snapshot will have 3X I/O for ALL changed data during the replication cycle, where VMware will have 2X I/O for all changed data and 3X I/O only for changed data that is changed again.
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

Re: Veeam vs. vSphere Replication

Veeam Logoby averylarry » Mon Oct 07, 2013 10:57 pm

Ken -- I'd still like you to address this:
averylarry wrote:"read a copy of the blocks at their current state for replication." "There is no intrusion or interaction with the production VM at all."

I do not understand how these 2 statements are not direct contradictions?
averylarry
Expert
 
Posts: 258
Liked: 28 times
Joined: Tue Mar 22, 2011 7:43 pm
Full Name: Ted

PreviousNext

Return to VMware vSphere



Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 15 guests