But again, to reiterate, this is not fact until it is proven so. I have no background in SCSI programming, NFS software or driver development. I am bringing this up as a discussion and to try and build clarity to the reasoning behind why or why not NFS should be deployed - This is simply my analysis of a perceived problem for a scenario many of us will face.
So in the most recent Veeam Forum Digest, a point was brought up about Microsoft's explicit lack of support for NFS storage for Exchange databases. The lack of support specifically states the following: (from http://technet.microsoft.com/en-us/libr ... 19301.aspx)
This would lead you to believe that presenting an Exchange virtual machine with a VMDK located on an NFS storage device in VMWare (so, adding an NFS datastore in VMWare and then adding a virtual disk to your exchange VM located on this NFS datastore) is NOT supported my Microsoft and has the potential to cause database corruption.All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2013 doesn't support the use of network attached storage (NAS) volumes, other than in the SMB 3.0 scenario outlined later in this topic. Also, NAS storage that's presented to the guest as block-level storage via the hypervisor isn't supported.
The primary reasoning behind this lack of support, based on other articles by IT professionals, seems to relate to how NFS processes read, write/commit and abort commands.
Firstly, a quick couple of potentially unrelated paragraphs on NFS synchronous writes.
NFS servers can be configured for either synchronous writes or asynchronous writes - With synchronous writes, data is flushed to stable storage (spinning disks, or your underlying reliable storage device) before the write is completed and confirmation is returned to the source of the write. Synchronous writes are usually the default (it seems) for most enterprise-grade NAS boxes, and this leads to the inclusion of very fast stable (NVRam/flash) write cache to prevent a drop in performance. Asynchronous writes are the equivalent of having no battery backed write cache in an iSCSI deployment, and will involve a loss of data if your controller/cache fails, as data that has not been flushed to stable disk will be lost upon controller/cache power loss or failure. The up-side of asynchronous writes is obviously the performance gain of not having to wait for each write to be committed.
This control over synchronous writes is the reason we use NFS (actually NFS on top of ZFS for data integrity). We can be sure, using NFS, that any write from a virtual machine - if it is received by VMWare - Will be treated as a synchronous write request to the NFS datastore. We control the sync process, and with ZFS we control the integrity of the write cache as well.
The benefits for database applications here (including Exchange) are clear - I have seen, with a lot of iSCSI based storage systems, that you are at the mercy of your appliance as to how data and when is written, and if and when synchronous writes are performed. We can, with our NFS deployment, in almost all circumstances be assured that data is written to stable storage as requested by the virtual machine. We had a case of data going missing due to a failure in the integrity of an iSCSI based SAN cache - With NFS these writes would not have been committed and we would not have lost data. Some point the finger at the SAN vendor, but regardless this would have been an issue avoided with NFS
This is not to say iSCSI setups are inherently unsafe or unreliable, just that with NFS you can have (and obviously, there are exceptions) more granular level of control for how writes are handled, at the expense of the overhead of the NFS protocol.
To the specific point, Exchange Databases and NFS. Most people seem to be blaming at the disconnection between the SCSI command set and the NFS command set. Specifically, NFS has a much shorter command set than SCSI does, so naturally some commands that are issued by a virtual machine cannot be natively translated to NFS equivalents. In VMWare's case, this means those commands are emulated.
The particular command that seems to be to blame, is the SCSI abort command.
In the latest newsletter post, Gostev brings to attention the following article:
http://windowsitpro.com/blog/nfs-and-ex ... ombination
Where Tony Redmond says the following, in regards to the issuing of a SCSI Abort:
My first point here, and again this is based on my knowledge of NFS so please correct me, is my understanding is that NFS does not support aborts on a best-effort basis. NFS does not support aborts at all."Block-level storage allows this to happen but file-level storage supports aborts on a best-effort basis"
Are SCSI aborts not a command issued at the driver layer, not at the application layer? How exactly ESE issues SCSI abort commands (and why it would issue a driver-level command for a storage transaction abort) is a mystery to me. Aborts are commands issued at the SCSI level to regain I/O continuity when drive-level transactions go missing or have problems, and why an application would be issuing these commands I am unsure (or do not fully understand).
Perhaps ESE does bypass normal OS storage I/O commands and interact with the SCSI devices directly, but I wouldn't have thought so.
Assuming I am incorrect, and ESE does rely on SCSI abort commands which do not exist in NFS, we now rely on VMWare to emulate these commands.
If you refer to the following:
http://www.google.com/patents/US7865663
Which from what I can tell, is a VMWare filed patent which details how SCSI commands are emulated at the virtual level for writes to NFS datatores. Of particular interest are the following, which you can go into detail reading about in the google patent doc:
"The method according to claim 1, wherein the SCSI commands are further emulated by modifying the stored data to reflect that a SCSI command is no longer pending, in response to a SCSI abort command."
This details the methods by which Aborts are emulated by modification of data on an NFS datastore. Unlike SCSI, synchronous write commands in NFS don’t generally just "go missing" between the client and server and aborts are not required. At the SCSI level, the driver may "think" they have gone missing and issue an abort, at which point VMWare will correct this data integrity difference if the data has been written. This does not read, at all, to be a "best effort" service - If so, massive data corruption would occur in all services (SQL, file storage etc) if abort commands were not emulated correctly."removing information from the stored data about a given pending SCSI command if a received I/O request is an abort command."
Perhaps (and this is just a guess) this point varies on synchronous vs asynchronous writes - asynchronous writes, perhaps VMWare will make datastore changes in response to an abort command for data that did in fact get lost (ie, NFS async write fails, VMWare removes data based on abort command that was never actually written). This really is a fault with using asynchronous writes to a poor performing datastore however, not a fault with the SCSI command emulation.
So there is my theory - Synchronous writes to NFS datastores, using VMWare's (not best-effort) SCSI abort emulation, pose no risk what so ever to datastore integrity.
My guess here, would be that Microsoft does not support Virtualised (Block storage) on NFS in VMWare because of lack of consistency in NFS deployments (sync vs no-sync, command emulation) - Leading to scenarios where underlying storage configuration problems with your NFS setup CAN lead to database corruption - Not because NFS cannot be deployed in such a way that it will be perfectly reliable. They have taken the safe option of simply not supporting it.
Prove me wrong, but please provide technical detail as to why aborts are best-effort, and why ESE relies on them. I'm not pretending I understand this all - I know I do not - but I DO want real answers!
Elliott