Yes, you are correct. We were running storage snapshots for 3 weeks without issue before this popped up.
I'm waiting on additional details from HPE but as I understand it there are 3 bugs they are aware of.
Two are resolved by P7 and P11 and the 3rd which hit us is resolved by P12 which is not out of QA as of this post.
Below is what HPE outlined to us in our ticket with them:
The System manager (sysmgr) process is one of the main Kernel process running on our Inserv and is running on the master node. We only allow one system manager process to be run once on the master node.
There are couple of child processes attached to the sysmgr process, for example the pdscrubber process responsible for chunklets relocation (servicemag process) or the tpdtcl process responsible to run cli commands on the Inserv.
When sysmgr became unresponsive and no specific tasks or processes are run on the Inserv (i.e no tuning or servicemag process running) or no pending IOCTL block are pending between the controller nodes it is usually safe to restart the system manager process.
Cause : Automation (Veeam) to incorrectly issue snapshot delete requests out of order and attempt to delete RO snaps (normally hidden), this causes the two snaps to be merged from an exceptions table point of view, but this can't be done because one the snaps is stuck in the "pending delete" and results in multiple node panic's
The snapshot removal process has been enhanced so as not to allow out of order snapshot removal with pending delete and is available in the upcoming 3.3.1.MU1 with Patch 12 CURRENTLY patch 12 is NOT available, but is currently undergoing Software QA testing and is expected to be available soon. (subject to change) (Update from LAB)
This is still ongoing with HPE support so I'll update the post with additional details as I get them.