all very good points.
touch wood - we've made a few adjustments and so far things are working ok, but they always do until they break
@lando_uk, yes you were spot on, we had deployed our REFS server from a template with the virtual memory settings set to be automatic. I've manually uped the REFS ram to be 16GB (the REFS server itself is virtual like everything else in our environment). I have manually defined the virtual ram to be have an initial size of 3GB and a max size of 48GB. I haven't seen the swap being used for any more than the initial allocation so far. I wonder if manually defining the size has any impact on the way memory is being used at the OS level. disk queues are typically in the 10-20 zone?
@tsighter, it would be interesting if you could run a test setup in a hyper-converged way so that you can adjust various ram allocations to the various layers and see how to control them to prevent failures.
in our situation, we have a Supermicro with 24 data spindles, we have the 24 spindles direct pass through via an OmniOS ZFS layer via iscsi to a windows 2016
currently ram utilisation (which seems to be stable – atleast until the next crash
MGMTDB: 2vCPU, 6GB ram: SQL server 2016 on top of Windows 2016 (DBs in SQL 2008 compat mode).
OmniOS: 4vCPU, 32GB ram: OmniOS ZFS iscsi SAN presenting 60TB block device.
Server-refs: 4vCPU, 16GB ram: Windows 2016, 60TB disk (drive F) attached via iscsi (multi-pathed) on a vswitch with no uplinks, drive F formatted as REFS.
vCentre: 4vCPU, 16GB ram: Windows 2016
Veeam-console: 4vCPU, 6GB ram: Windows 2016
VeeamProxy01-proxy04: 4vCPU, 8GB ram, Windows 2016