Latency Control for distributed / hyper-converged datastores

VMware specific discussions

Latency Control for distributed / hyper-converged datastores

Veeam Logoby jpiscaer » Tue Oct 27, 2015 11:03 pm

Hi guys,

We're running Nutanix as our production storage environment. We'd like to enable storage latency control for our Nutanix NFS datastores, but we're weary to do so given the distributed nature of the datastores.

1. Can we safely enable storage latency control for Nutanix datastore?
2. If not, can we do a feature request to support distributed datastores by monitoring the latency for any given datastore from all hosts in a vSphere cluster?
jpiscaer
Veeam Vanguard
 
Posts: 42
Liked: 9 times
Joined: Tue Jun 16, 2009 11:36 am
Full Name: Joep Piscaer

Re: Latency Control for distributed / hyper-converged datast

Veeam Logoby foggy » Wed Oct 28, 2015 3:55 pm

Joep, since Veeam B&R operates latency data provided by VMware, I'd be more interested in whether VMware itself correctly reports latency for such datastores.
foggy
Veeam Software
 
Posts: 14742
Liked: 1079 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Latency Control for distributed / hyper-converged datast

Veeam Logoby dellock6 » Mon Nov 02, 2015 5:41 pm

Uhm, interesting. The datatore is shared between all the nodes in the cluster, and even if each CVM exposes its own "local" view on the datastore, at the vCenter layer it all appears as a single large shared datastore. It all comes down to how Nutanix exposes read latency informations to vCenter, is this value an average of the entire volume, or something else? Because we read the read latency from vCenter, so anything exposed by Nutanix is what we take for granted.
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 5047
Liked: 1330 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: Latency Control for distributed / hyper-converged datast

Veeam Logoby jpiscaer » Mon Nov 02, 2015 6:22 pm

I figured you'd do your own stats vs. pulling them up from vCenter. Makes is a bit more complicated, although I do still see a use case for per-host datastore metrics to optimize for distributed systems..
jpiscaer
Veeam Vanguard
 
Posts: 42
Liked: 9 times
Joined: Tue Jun 16, 2009 11:36 am
Full Name: Joep Piscaer

Re: Latency Control for distributed / hyper-converged datast

Veeam Logoby tsightler » Mon Nov 02, 2015 7:00 pm 2 people like this post

Veeam uses per-host metrics from vCenter for this. Here's a simple example from the Veeam log when we setup a monitor for datastore performance:
Code: Select all
[25.10.2015 01:06:40] <43> Info     [DatastoreIO] Checking availability host 'host-13413' metrics. Metrics ids: [144,145], interval: 20, startTime: '01.06.21.032', endTime: '01.06.41.032'

In this example host-13413 is "esx03" in my cluster and we are monitoring metrics 144 & 145, which correspond to "totalReadlatecy" and "totalWritelatency" for that datastore as that host sees it (i.e. as measured by VMware). Obviously though each host can have a totally different view of latency for a given datastore as seen in the below two screenshots showing the same time frame (well almost, off by 1 minute) for two different host within the same cluster based on their view of the same datastore (latency is very high because of some ongoing testing, no worries):
Image
Image
So we already have a "per-host" view of the datastore, however, I agree that from Veeam's perspective we would not have enough knowledge of the underlying per-host caching model in such a distributed environment and thus, if latency for that datastore is high on one node, we would most likely would not assign any additional tasks to other nodes in the cluster. You would almost want a case where we would assign at least one task per-host (assuming there are such task), and only throttle/limit beyond that.

Interestingly, I haven't really seen this become an issue in the field on Nutanix, and I've used I/O control there. It may simply be that it takes far more than a single task to overload the latency of a given host anyway (in my experience) so there's always plenty of headroom. It would definitely be something good to test in more detail.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler


Return to VMware vSphere



Who is online

Users browsing this forum: hiraoglu, Yahoo [Bot] and 16 guests