Nutanix AHV Orphaned Snapshots - Finding and Removal

Tisinger · Post by **Tisinger** » Dec 10, 2019 6:02 pm this post

Old case from another user: 03215438, referenced by my old case: 03553000, similar to my current case: 03903145.

I have had 4 instances in the last year where I have needed to go deal with snapshots that weren't getting deleted properly. I have an old case and my current one where this is widespread enough I wanted help. The other two times there was a VM or two with issues and we sort of guessed using file dates on the Nutanix side and got things working. My current case is stalled as the support rep has asked me to contact Nutanix (who has a standing policy of asking users to contact vendors for any snapshot issues outside of using the Nutanix protection groups). I have indicated that I got specific help from Veeam support on this before, so we'll see how it goes.

Since this is an ongoing issue for us, I am looking to develop some tools and procedures to help us figure out which snapshots *might* be problematic, based on which jobs are failing (and how).

So, I am looking for anyone to share:

In English what each table withing the Sqlite database on the appliance holds and/or what updates that table

Any Sqlite queries that deal with searching or enumerating the JSON data (rather than just dumping a column with the JSON text)

Any scripts/queries that can identify the snapshot UID associated with a job (I am thinking more of a list here, but I'll take what I can get)

Any CURL procedures that use the API to gather information on specific Nutanix snapshots

Anything else you think might be helpful

The previous time, support shared a script with me that I have lost and cant get back off the ftp site for the case (as it has been closed a while). So if there is anything support could share (or maybe a resource I haven't found), that would also be helpful.

If this is something that will put me at odds with Veeam I need to know that right away as it's not my intention. I doubt this since I was editing the script and running it in the previous case, but one never knows.

Thanks!

Tisinger · Dec 18, 2019 4:12 pm

I just wanted to briefly update my query. It turns out that there are orphaned snapshots, and there are "stale" snapshots. The second category is more problematic and is behind the most recent issue I was facing. These "stale" snapshots are really kind of a hung snapshot operation (I guess usually a delete) that removed the snapshot but failed to update whatever data structure Nutanix uses to track these things within the cluster itself.

Veeam support was clear in bringing this to Nutanix. Nutanix had a custom python script they used to find and remove these "stale" snapshots. Upgrading to the most recent patch level (on long-term or short-term release schedule) will patch the root cause, but one still needs to clear out the ones already there to get Veeam running smoothly again. If anyone faces this issue, I hope this helps and good luck!

R&D Forums

Nutanix AHV Orphaned Snapshots - Finding and Removal

Re: Nutanix AHV Orphaned Snapshots - Finding and Removal

Who is online