Veeam uses per-host metrics from vCenter for this. Here's a simple example from the Veeam log when we setup a monitor for datastore performance:
- Code: Select all
[25.10.2015 01:06:40] <43> Info [DatastoreIO] Checking availability host 'host-13413' metrics. Metrics ids: [144,145], interval: 20, startTime: '01.06.21.032', endTime: '01.06.41.032'
In this example host-13413 is "esx03" in my cluster and we are monitoring metrics 144 & 145, which correspond to "totalReadlatecy" and "totalWritelatency" for that datastore as that host sees it (i.e. as measured by VMware). Obviously though each host can have a totally different view of latency for a given datastore as seen in the below two screenshots showing the same time frame (well almost, off by 1 minute) for two different host within the same cluster based on their view of the same datastore (latency is very high because of some ongoing testing, no worries):
So we already have a "per-host" view of the datastore, however, I agree that from Veeam's perspective we would not have enough knowledge of the underlying per-host caching model in such a distributed environment and thus, if latency for that datastore is high on one node, we would most likely would not assign any additional tasks to other nodes in the cluster. You would almost want a case where we would assign at least one task per-host (assuming there are such task), and only throttle/limit beyond that.
Interestingly, I haven't really seen this become an issue in the field on Nutanix, and I've used I/O control there. It may simply be that it takes far more than a single task to overload the latency of a given host anyway (in my experience) so there's always plenty of headroom. It would definitely be something good to test in more detail.