Discussions specific to the VMware vSphere hypervisor
Post Reply
jpiscaer
Enthusiast
Posts: 42
Liked: 9 times
Joined: Jun 16, 2009 11:36 am
Full Name: Joep Piscaer
Contact:

Latency Control for distributed / hyper-converged datastores

Post by jpiscaer » Oct 27, 2015 11:03 pm

Hi guys,

We're running Nutanix as our production storage environment. We'd like to enable storage latency control for our Nutanix NFS datastores, but we're weary to do so given the distributed nature of the datastores.

1. Can we safely enable storage latency control for Nutanix datastore?
2. If not, can we do a feature request to support distributed datastores by monitoring the latency for any given datastore from all hosts in a vSphere cluster?

foggy
Veeam Software
Posts: 18287
Liked: 1568 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Latency Control for distributed / hyper-converged datast

Post by foggy » Oct 28, 2015 3:55 pm

Joep, since Veeam B&R operates latency data provided by VMware, I'd be more interested in whether VMware itself correctly reports latency for such datastores.

dellock6
Veeam Software
Posts: 5734
Liked: 1626 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Latency Control for distributed / hyper-converged datast

Post by dellock6 » Nov 02, 2015 5:41 pm

Uhm, interesting. The datatore is shared between all the nodes in the cluster, and even if each CVM exposes its own "local" view on the datastore, at the vCenter layer it all appears as a single large shared datastore. It all comes down to how Nutanix exposes read latency informations to vCenter, is this value an average of the entire volume, or something else? Because we read the read latency from vCenter, so anything exposed by Nutanix is what we take for granted.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2019
Veeam VMCE #1

jpiscaer
Enthusiast
Posts: 42
Liked: 9 times
Joined: Jun 16, 2009 11:36 am
Full Name: Joep Piscaer
Contact:

Re: Latency Control for distributed / hyper-converged datast

Post by jpiscaer » Nov 02, 2015 6:22 pm

I figured you'd do your own stats vs. pulling them up from vCenter. Makes is a bit more complicated, although I do still see a use case for per-host datastore metrics to optimize for distributed systems..

tsightler
VP, Product Management
Posts: 5424
Liked: 2244 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Latency Control for distributed / hyper-converged datast

Post by tsightler » Nov 02, 2015 7:00 pm 2 people like this post

Veeam uses per-host metrics from vCenter for this. Here's a simple example from the Veeam log when we setup a monitor for datastore performance:

Code: Select all

[25.10.2015 01:06:40] <43> Info     [DatastoreIO] Checking availability host 'host-13413' metrics. Metrics ids: [144,145], interval: 20, startTime: '01.06.21.032', endTime: '01.06.41.032'
In this example host-13413 is "esx03" in my cluster and we are monitoring metrics 144 & 145, which correspond to "totalReadlatecy" and "totalWritelatency" for that datastore as that host sees it (i.e. as measured by VMware). Obviously though each host can have a totally different view of latency for a given datastore as seen in the below two screenshots showing the same time frame (well almost, off by 1 minute) for two different host within the same cluster based on their view of the same datastore (latency is very high because of some ongoing testing, no worries):
Image
Image
So we already have a "per-host" view of the datastore, however, I agree that from Veeam's perspective we would not have enough knowledge of the underlying per-host caching model in such a distributed environment and thus, if latency for that datastore is high on one node, we would most likely would not assign any additional tasks to other nodes in the cluster. You would almost want a case where we would assign at least one task per-host (assuming there are such task), and only throttle/limit beyond that.

Interestingly, I haven't really seen this become an issue in the field on Nutanix, and I've used I/O control there. It may simply be that it takes far more than a single task to overload the latency of a given host anyway (in my experience) so there's always plenty of headroom. It would definitely be something good to test in more detail.

Post Reply

Who is online

Users browsing this forum: No registered users and 27 guests