Agentless, cloud-native backup for Microsoft Azure
Post Reply
RobMiller86
Service Provider
Posts: 143
Liked: 25 times
Joined: Oct 28, 2019 7:10 pm
Full Name: Rob Miller
Contact:

Failed State in Azure - No Alarm

Post by RobMiller86 »

We have now seen a couple of occurrences where a VM enters a failed state in Azure. The VM is still up and responding, so our other monitoring doesn't detect any issues. But in Azure, the VM shows as being in a failed state and you have to click the button to reboot and redeploy.

When this happens, Veeam can't see the VM. If you look at the sources in the backup policy, the VM in question has a blank ID/Value column. As soon as you redeploy in Azure, the column is populated and Veeam can back it up.

We haven't seen any alarms for this condition in our VSPC. Shouldn't Veeam trigger an alarm when a VM simply drops off like that? Is this a known issue or perhaps do we have something misconfigured? It seems like Veeam considers the VM deleted, so ignores the condition. Unfortunately, I don't have an example right now to open a ticket but will do so when this occurs again. I'm just wondering if anyone else has witnessed this condition.
nielsengelen
Product Manager
Posts: 5660
Liked: 1189 times
Joined: Jul 15, 2013 11:09 am
Full Name: Niels Engelen
Contact:

Re: Failed State in Azure - No Alarm

Post by nielsengelen »

Hey Rob,

Did you already contact support for this and do you have a case ID? Looks like we'll need to do some more troubleshooting to understand what is causing this.
Personal blog: https://foonet.be
GitHub: https://github.com/nielsengelen
RobMiller86
Service Provider
Posts: 143
Liked: 25 times
Joined: Oct 28, 2019 7:10 pm
Full Name: Rob Miller
Contact:

Re: Failed State in Azure - No Alarm

Post by RobMiller86 »

Hi Niels,

Unfortunately, I do not. This was addressed by some other staff members over the past couple of months. I've been told it has happened 3 times across our client base. We are in the process of migrating a lot of customers over to Veeam. I will open a ticket the next time it's brought to my attention, and I have an example. I was just curious what the expected behavior of VBAZ is when an Azure VM enters a failed state like this, where VBAZ can no longer see the VM in Azure, but the VM is still up.
RobMiller86
Service Provider
Posts: 143
Liked: 25 times
Joined: Oct 28, 2019 7:10 pm
Full Name: Rob Miller
Contact:

Re: Failed State in Azure - No Alarm

Post by RobMiller86 »

I just found another one of these "The virtual machine is in a failed state" in Azure. Luckily for us, this deployment is still using out our older method where we manually added the VMs to the policy, rather than adding them via tag. So the policy was failing as it was manually configured with the VM, and it couldn't find it.

However, this is giving me more concern with our current method of adding VMs by tag. Can I get any confirmation from Veeam on what happens in the following scenario:

1. We are adding VMs to a policy with a specific tag. We then add the tag to the VMs we want to backup with the policy. We are not directly adding VMs to the policy, just the tag.
2. All vms are discovered, backups begin.
3. Sometime later, a VM enters a failed state in Azure. The VM is still up and running, so other monitoring tools don’t detect the failure. However, in Azure, if you pull up the VM overview, up top it says "The last operation performed on this VM failed. The VM is still running. View error details". Then if you click that, in Azure it says "The virtual machine is in a failed state. The fabric operation failed. Reapplying the virtual machine may resolve the issue." Error code: InternalExecutionError. Provision state: Failed. If you click reapply, the VM is reprovisoned in Azure, rebooted, and all is well again.
4. When you pull up the Veeam backup policy in this state, before fixing it above in step 3, the VM is still listed in the policy in the Name/Key column, but the ID/Value (The azure resource ID) is blank.

Will VBAZ still fail a policy in this state? If it previously found the VM to protect via tag, and now it can't find the VM via tag, can I have 100% confirmation that it will now fail the policy instead of thinking the VM was just removed and no longer needs to be backed up? Like I said before, I don't have a current example to open a ticket, but I can't recreate this issue to test, and it's very hard to find unless you are doing manual reviews. I'm considering reverting and not adding VMs via tags due to this issue that I have witnessed myself twice, and someone else witnessed it once. These were on V5, not V6. So I'm not sure if this has improved or not. Is it not safe to add a VM via tag due to this?

I do know the policy fails in this azure failed state if the VM was directly added by "Resource types: virtual machine [name or id]". But I have seen it not fail the policy if the VM was added by "Resource types: tag [key] [value]". With the tag added to the VM. I know it's tough as I can't provide a current example. But it's a big concern as we could go months of longer in this state without realizing it and having no backups. Confirmation for exactly how VBAZ will handle this condition would be much appreciated, or I guess we just shouldn't use the tag method at all for peace of mind.
lyudmila.ezerskaya
Veeam Software
Posts: 114
Liked: 34 times
Joined: Oct 04, 2021 4:08 pm
Full Name: Lyudmila Ezerskaya
Contact:

Re: Failed State in Azure - No Alarm

Post by lyudmila.ezerskaya »

Hi Rob! We will investigate this issue and keep you updated on the results. Thank you!
lyudmila.ezerskaya
Veeam Software
Posts: 114
Liked: 34 times
Joined: Oct 04, 2021 4:08 pm
Full Name: Lyudmila Ezerskaya
Contact:

Re: Failed State in Azure - No Alarm

Post by lyudmila.ezerskaya »

Hi! By design, Veeam Backup for Microsoft Azure skips VMs with the Failed provisioning state during synchronization.

If a VM was manually added to the backup scope, during the backup session, the policy will attempt to process it. However, since it cannot be reached, the policy will fail.

However, when VMs are added to the backup scope using tags, resource groups, or subscriptions, the backup scope adjusts dynamically with each synchronization with Azure. If a VM is in the Failed state during synchronization, it will be skipped and excluded from the backup scope. Since this VM is no longer included in the backup scope, the policy will not attempt to protect it and therefore will not fail.
RobMiller86
Service Provider
Posts: 143
Liked: 25 times
Joined: Oct 28, 2019 7:10 pm
Full Name: Rob Miller
Contact:

Re: Failed State in Azure - No Alarm

Post by RobMiller86 »

Yup. Just found a couple more. Failed state in Azure. No alarm at all from VSPC, and the policy shows fine in VBAZ. This is a huge problem. No one should be using tags to backup VMs, if they actually care about the backups. Personally, I think VBAZ should handle this more intelligently. As it stands, the only way to be sure you are backing up VMs, is to manually add them. Nothing else can or should be trusted. We will now convert all of our VBAZ. A bit irritating honestly. This is going to burn someone at some point.
Vitaliy S.
VP, Product Management
Posts: 27184
Liked: 2739 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Failed State in Azure - No Alarm

Post by Vitaliy S. »

Hi Rob,

In your case, I would recommend using the "VM without backup (RPO-based)" alarm in VSPC to be notified when a VM was skipped from processing for whatever reason.

As for the VB itself, we will discuss how our handling of "failed VMs" can be improved. For example, we can warn (optional setting) that the VM is no longer backed up to ensure it is international.

Thanks!
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests