Exchange & NFS in VMWare - My thoughts on why this can work

elliott · Mar 12, 2014 1:16 am

Before I begin, I would like to emphasize that the following is only my opinion. I have deployed a number of high user mailbox databases over quite a few years on a consistent NFS deployment architecture. We in fact changed to our particular NFS deployment to achieve a lower risk of database corruption in a correctly deployed NFS architecture.

But again, to reiterate, this is not fact until it is proven so. I have no background in SCSI programming, NFS software or driver development. I am bringing this up as a discussion and to try and build clarity to the reasoning behind why or why not NFS should be deployed - This is simply my analysis of a perceived problem for a scenario many of us will face.

So in the most recent Veeam Forum Digest, a point was brought up about Microsoft's explicit lack of support for NFS storage for Exchange databases. The lack of support specifically states the following: (from http://technet.microsoft.com/en-us/libr ... 19301.aspx)

All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2013 doesn't support the use of network attached storage (NAS) volumes, other than in the SMB 3.0 scenario outlined later in this topic. Also, NAS storage that's presented to the guest as block-level storage via the hypervisor isn't supported.

This would lead you to believe that presenting an Exchange virtual machine with a VMDK located on an NFS storage device in VMWare (so, adding an NFS datastore in VMWare and then adding a virtual disk to your exchange VM located on this NFS datastore) is NOT supported my Microsoft and has the potential to cause database corruption.

The primary reasoning behind this lack of support, based on other articles by IT professionals, seems to relate to how NFS processes read, write/commit and abort commands.

Firstly, a quick couple of potentially unrelated paragraphs on NFS synchronous writes.
NFS servers can be configured for either synchronous writes or asynchronous writes - With synchronous writes, data is flushed to stable storage (spinning disks, or your underlying reliable storage device) before the write is completed and confirmation is returned to the source of the write. Synchronous writes are usually the default (it seems) for most enterprise-grade NAS boxes, and this leads to the inclusion of very fast stable (NVRam/flash) write cache to prevent a drop in performance. Asynchronous writes are the equivalent of having no battery backed write cache in an iSCSI deployment, and will involve a loss of data if your controller/cache fails, as data that has not been flushed to stable disk will be lost upon controller/cache power loss or failure. The up-side of asynchronous writes is obviously the performance gain of not having to wait for each write to be committed.

This control over synchronous writes is the reason we use NFS (actually NFS on top of ZFS for data integrity). We can be sure, using NFS, that any write from a virtual machine - if it is received by VMWare - Will be treated as a synchronous write request to the NFS datastore. We control the sync process, and with ZFS we control the integrity of the write cache as well.

The benefits for database applications here (including Exchange) are clear - I have seen, with a lot of iSCSI based storage systems, that you are at the mercy of your appliance as to how data and when is written, and if and when synchronous writes are performed. We can, with our NFS deployment, in almost all circumstances be assured that data is written to stable storage as requested by the virtual machine. We had a case of data going missing due to a failure in the integrity of an iSCSI based SAN cache - With NFS these writes would not have been committed and we would not have lost data. Some point the finger at the SAN vendor, but regardless this would have been an issue avoided with NFS

This is not to say iSCSI setups are inherently unsafe or unreliable, just that with NFS you can have (and obviously, there are exceptions) more granular level of control for how writes are handled, at the expense of the overhead of the NFS protocol.

To the specific point, Exchange Databases and NFS. Most people seem to be blaming at the disconnection between the SCSI command set and the NFS command set. Specifically, NFS has a much shorter command set than SCSI does, so naturally some commands that are issued by a virtual machine cannot be natively translated to NFS equivalents. In VMWare's case, this means those commands are emulated.

The particular command that seems to be to blame, is the SCSI abort command.

In the latest newsletter post, Gostev brings to attention the following article:
http://windowsitpro.com/blog/nfs-and-ex ... ombination

Where Tony Redmond says the following, in regards to the issuing of a SCSI Abort:

"Block-level storage allows this to happen but file-level storage supports aborts on a best-effort basis"

My first point here, and again this is based on my knowledge of NFS so please correct me, is my understanding is that NFS does not support aborts on a best-effort basis. NFS does not support aborts at all.

Are SCSI aborts not a command issued at the driver layer, not at the application layer? How exactly ESE issues SCSI abort commands (and why it would issue a driver-level command for a storage transaction abort) is a mystery to me. Aborts are commands issued at the SCSI level to regain I/O continuity when drive-level transactions go missing or have problems, and why an application would be issuing these commands I am unsure (or do not fully understand).

Perhaps ESE does bypass normal OS storage I/O commands and interact with the SCSI devices directly, but I wouldn't have thought so.

Assuming I am incorrect, and ESE does rely on SCSI abort commands which do not exist in NFS, we now rely on VMWare to emulate these commands.
If you refer to the following:
http://www.google.com/patents/US7865663
Which from what I can tell, is a VMWare filed patent which details how SCSI commands are emulated at the virtual level for writes to NFS datatores. Of particular interest are the following, which you can go into detail reading about in the google patent doc:

"The method according to claim 1, wherein the SCSI commands are further emulated by modifying the stored data to reflect that a SCSI command is no longer pending, in response to a SCSI abort command."

"removing information from the stored data about a given pending SCSI command if a received I/O request is an abort command."

This details the methods by which Aborts are emulated by modification of data on an NFS datastore. Unlike SCSI, synchronous write commands in NFS don’t generally just "go missing" between the client and server and aborts are not required. At the SCSI level, the driver may "think" they have gone missing and issue an abort, at which point VMWare will correct this data integrity difference if the data has been written. This does not read, at all, to be a "best effort" service - If so, massive data corruption would occur in all services (SQL, file storage etc) if abort commands were not emulated correctly.

Perhaps (and this is just a guess) this point varies on synchronous vs asynchronous writes - asynchronous writes, perhaps VMWare will make datastore changes in response to an abort command for data that did in fact get lost (ie, NFS async write fails, VMWare removes data based on abort command that was never actually written). This really is a fault with using asynchronous writes to a poor performing datastore however, not a fault with the SCSI command emulation.

So there is my theory - Synchronous writes to NFS datastores, using VMWare's (not best-effort) SCSI abort emulation, pose no risk what so ever to datastore integrity.

My guess here, would be that Microsoft does not support Virtualised (Block storage) on NFS in VMWare because of lack of consistency in NFS deployments (sync vs no-sync, command emulation) - Leading to scenarios where underlying storage configuration problems with your NFS setup CAN lead to database corruption - Not because NFS cannot be deployed in such a way that it will be perfectly reliable. They have taken the safe option of simply not supporting it.

Prove me wrong, but please provide technical detail as to why aborts are best-effort, and why ESE relies on them. I'm not pretending I understand this all - I know I do not - but I DO want real answers!

Elliott

Mar 12, 2014 4:02 pm

They do not say it can't work, they just say it is not supported

IMHO, everything is design related. If design is good, you should never have problems, if it is bad, you will have problems even if you use the best suited technology.
So, If you take into account the NFS caveats (in a VMware implementation) when designing, you shoud be fine.

What caveats I am thinking of :
- single data stream per mounted datastore (the single lane highway)
- CPU impact under HEAVY load (pretty uncommon)
- no udp support at the moment
- relying on multiple, stacked, not synced each other, resiliency mechanisms impacting latency (flow control at layer 2, TCP retransmission at layer 4)
- no control over the remote filesystem behaviour (vendor specific, as you mentioned)

Since you seem to know what you are doing and how things work, I'm pretty sure that all the servers you have deployed are just working fine and will ever do so. The only problem I see would be Microsoft support in case of a software issue related to ESE, you would be alone in the world because of an unsupported setup.

Eric.

Post by **tsightler** » Mar 12, 2014 7:01 pm this post

Yeah, when that article was talking about SCSI aborts it didn't make a lot of sense to me either. I would love to have more detail on that. I mean, SCSI aborts are pretty much always "best effort" since you can never be sure that the storage device you are talking to is still accepting commands. Typically, the whole reason you're trying to abort them is because issued command are not returning which usually indicates things have already gone "wonky" (to use a technical term). There's no guarantee at that point that the storage device will respond to the abort any more than it responded to the initial commands, and eventually the SCSI layer will get tired of trying and send a SCSI reset, at which point the state of those other commands is effectively "unknown".

Also, other highly transactional systems have supported NFS for quite a long time (we used to run Oracle on Netapp NFS over a decade ago, and it was a fully certified solution by Oracle). Sure, you have to mount the NFS points with specific mount parameters, but the point is these problems have been solved for a very long time.

elliott · Post by **elliott** » Mar 19, 2014 2:31 am this post

Anything on this from the Veeam guys?

Mar 19, 2014 10:39 am

Nothing we can say officially as Veeam, we are not involved in the infrastructure design of a customer, we "simply" support what is supported by the infrastructure we interact with. So if Microsoft says Exchange on NFS is unsupported, we stick to this statement.

On a personal note instead, I find this limit a little bit outdated, but it's also true many NFS implementations are so different one from the other, and it's hard for a vendor to say which one is supported and what is not. Maybe NetApp has a great NFS stack, others do not; if I'm Microsoft, what can I say? They would need to have an HCL like VMware has for example, or simply accept VMware HCL. At least when Exchange is running as a VM on vSphere, it should be supported if the underlying storage is supported by VMware. The VM is supposed to know anything about the underlying stack, so every support problem is due to the hypervisor. But then the customer risks to get tucked in a "passing the ball" situation if both VMware and Microsoft start blaming the other part...

By the way, VMware should also update its NFS support, they accept only NFS v3, that is 12 years old now...

Luca.

R&D Forums

Exchange & NFS in VMWare - My thoughts on why this can work

Re: Exchange & NFS in VMWare - My thoughts on why this can w

Re: Exchange & NFS in VMWare - My thoughts on why this can w

Re: Exchange & NFS in VMWare - My thoughts on why this can w

Re: Exchange & NFS in VMWare - My thoughts on why this can w

Who is online