Long night (it's almost 11 pm now) - I shutdown one of my affected VMs at 6:30 pm (a 500GB SQL database for a ERP app that I couldn't backup because I couldn't take a checkpoint) only to find my checkpoint chain was corrupt and the VM would not boot... After about 3 3/4 hours of darn near heart failure (who expects disk corruption rebooting a server - oh silly me - it was Hyper-V - I should have known), with help of Veeam support (go Veeam!!!) I managed to get my checkpoint chain put back together without data lost but I definitely lost at least 1 year off my life and added many many grey hairs.
Anyways - I'm not here to rant about Hyper-V (I'll do that on my own blog), but what I am here to say is that I discover the kernel System process (PID 4) is/was holding some of the vhds opened for that VM and one other even after the VMs were powered off and the HyperV services were restarted. I found this out while trying to manually make a copy of my vhds before trying to piece the checkpoint chain back together. I got an access is denied because the file was opened when I tried to move the files (after making a copy to a 2nd SAN volume, I was just going to move the original files to a temp location to work on them). Once I got this, I wanted to know what was holding the files opened so I run two Sysinternal Suites tools - psexec and procexp64 (both of which are in the path of every Windows box I deploy). To do so I opened an Administrative command prompt and ran:
psexec -i -d -s procexp64.exe
This launches Process Explorer as "NT Authority\System". Then, I hit CTRL+F to "Find Handle or DLL" and entered a wildcard search for the vhd name (i.e. CUST1*). This returns all the handles that have the files named CUST1* opened. Sure enough, it was PID 4, System.
Now what is significant about this is that we've been having issues ejecting RDX cartridges at two other client locations after the backup completes (mind you both those locations are VMware, not Hyper-V, and VBR servers there are 2012 R2 and not 2016). Using the same steps above, I had already determined it was PID 4, System that is holding the cartridges opened which prevents Windows from ejecting them - but I didn't have enough information to put 2 + 2 together. Most of my customers' setups are cookie cutter - so the first thing that popped into my head that was common was they were all HPE DL380s (one Gen9, and two Gen10). Two were fresh windows loads, and one was a migration from a Gen9. They all had the current HPE Support Pack for Proliant deployed to them. So the next thing in common was the RDX drives (the checkpoint issues only occur during backup to RDX for me, not when going to the StoreOnce Catalyst share). So I check if we had loaded the HPE RDX Tools on those other two servers - yup - same version as on the Hyper-V host - HPE RDX Tools 1.59. A quick survey of all my clients reveals that the **ONLY** three locations that have the service RDXmon 1.51 installed and running (which comes from HPE RDX Tools 1.59 - go figure) are these three systems - the three systems that seem to be having file lock issues after dealing with the RDX drives. All the other sites either don't have RDX Tools installed (even if they do have an RDX drive) or they have older version of HPE RDX Tools (i.e. 1.56) installed.
So I stopped the RDXmon 1.51 service and the RDXSoftEjectService and set both disabled. We'll see what happens tomorrow I guess (only 7 hours away until my alarm goes off!)
BTW - one comment about my support experience tonight for you Veeam guys that monitor these threads. The engineer I was talking was smart, polite, and seemed pretty confident in that fact that it was just a matter of stitching the checkpoints back together based on what I described to him, but I was very disappoint with the reference material I was given to do it with - a 3rd party blog and a separate YouTube video from someone else for Hyper-V on 2008 using a 3rd party tool. I really had expected Veeam to have formal documentation on using get-vhd to determine the parent path, set-vhd to reconnect them, and mount-vhd to verify them afterwards. After I went through what I was sent by the engineer, I did some additional research and PowerShell was much simpler (and quicker) than the Hyper-V GUI and 3rd party tool that I was directed to in the video. I guess on the flip side of the coin, next week maybe Veeam support can reference my more relevant blog after I write up my "HOWTO: Piece checkpoints back together again after failed merges" entry and post it!