Latency Control for distributed / hyper-converged datastores

jpiscaer · Post by **jpiscaer** » Oct 27, 2015 11:03 pm this post

Hi guys,

We're running Nutanix as our production storage environment. We'd like to enable storage latency control for our Nutanix NFS datastores, but we're weary to do so given the distributed nature of the datastores.

1. Can we safely enable storage latency control for Nutanix datastore?
2. If not, can we do a feature request to support distributed datastores by monitoring the latency for any given datastore from all hosts in a vSphere cluster?

Post by **foggy** » Oct 28, 2015 3:55 pm this post

Joep, since Veeam B&R operates latency data provided by VMware, I'd be more interested in whether VMware itself correctly reports latency for such datastores.

Post by **dellock6** » Nov 02, 2015 5:41 pm this post

Uhm, interesting. The datatore is shared between all the nodes in the cluster, and even if each CVM exposes its own "local" view on the datastore, at the vCenter layer it all appears as a single large shared datastore. It all comes down to how Nutanix exposes read latency informations to vCenter, is this value an average of the entire volume, or something else? Because we read the read latency from vCenter, so anything exposed by Nutanix is what we take for granted.

jpiscaer · Post by **jpiscaer** » Nov 02, 2015 6:22 pm this post

I figured you'd do your own stats vs. pulling them up from vCenter. Makes is a bit more complicated, although I do still see a use case for per-host datastore metrics to optimize for distributed systems..

Nov 02, 2015 7:00 pm

Veeam uses per-host metrics from vCenter for this. Here's a simple example from the Veeam log when we setup a monitor for datastore performance:

Code: Select all

[25.10.2015 01:06:40] <43> Info     [DatastoreIO] Checking availability host 'host-13413' metrics. Metrics ids: [144,145], interval: 20, startTime: '01.06.21.032', endTime: '01.06.41.032'

In this example host-13413 is "esx03" in my cluster and we are monitoring metrics 144 & 145, which correspond to "totalReadlatecy" and "totalWritelatency" for that datastore as that host sees it (i.e. as measured by VMware). Obviously though each host can have a totally different view of latency for a given datastore as seen in the below two screenshots showing the same time frame (well almost, off by 1 minute) for two different host within the same cluster based on their view of the same datastore (latency is very high because of some ongoing testing, no worries):

So we already have a "per-host" view of the datastore, however, I agree that from Veeam's perspective we would not have enough knowledge of the underlying per-host caching model in such a distributed environment and thus, if latency for that datastore is high on one node, we would most likely would not assign any additional tasks to other nodes in the cluster. You would almost want a case where we would assign at least one task per-host (assuming there are such task), and only throttle/limit beyond that.

Interestingly, I haven't really seen this become an issue in the field on Nutanix, and I've used I/O control there. It may simply be that it takes far more than a single task to overload the latency of a given host anyway (in my experience) so there's always plenty of headroom. It would definitely be something good to test in more detail.

R&D Forums

Latency Control for distributed / hyper-converged datastores

Re: Latency Control for distributed / hyper-converged datast

Re: Latency Control for distributed / hyper-converged datast

Re: Latency Control for distributed / hyper-converged datast

Re: Latency Control for distributed / hyper-converged datast

Who is online