Host-based backup of VMware vSphere VMs.
Post Reply
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Error - running ping test

Post by Frosty »

This is such a basic problem that I'm almost embarrassed to be posting it. I have a set of SureBackup jobs that run on a scheduled basis each week. Most weeks they work 100%. Every now and again, one of the VMs in one of the jobs will fail. When that happens, I usually run a one-off SureBackup job, just targeting that VM or that group of VMs in the particular backup job. Nearly always works on the one-off run, so I know that the backup is fine and that it was a transient issue, perhaps a timeout due to performance/server load, that sort of thing.

Last weekend a VM failed in the regular SureBackup run. I configured a one-off SureBackup job. It failed "Error <serverName> - running ping test(s)". Ran it again, same result. I double-checked that the original VM allows ping traffic. It does. Took a fresh backup image of the VM, then re-ran the SureBackup using that fresh backup file. Same result. Rebooted the original VM, took another fresh backup, re-ran SureBackup, same result. The VM has plenty of free disk space. I'm plumb out of ideas of what to troubleshoot next.

Any suggestions? This same VM has worked fine in the same SureBackup job for months and months.
Egor Yakovlev
Veeam Software
Posts: 2537
Liked: 683 times
Joined: Jun 14, 2013 9:30 am
Full Name: Egor Yakovlev
Location: Prague, Czech Republic
Contact:

Re: Error - running ping test

Post by Egor Yakovlev »

Hi Stephen,

Are other VMs in the same network passing SureBackup ping tests? We need to find what causes lost pings - its either network settings on Veeam server(doubt those change), network settings on Veeam Virtual Lab appliance(could be the reason), or network settings of the machine inside the Virtual Lab.
Try editing settings of SureBackup job to set checkbox [x] Keep application group running at "Application Group" step. That will keep your SureBackup session after all tests and will not destroy the lab afterwards. That will allow you to troubleshoot networking between Veeam server and said machine in the lab, using "tracert -d your_vm_in_the_lab_masquerade_ip" to start with.

Thanks!
Andreas Neufert
VP, Product Management
Posts: 6749
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Error - running ping test

Post by Andreas Neufert »

There are too many reasons why this is not working. Basically SureBackup did it´s job and reported you a potential issue if you restore this VM.

I suggest to do the following:
1) Add the VM to an "Application Group" and start it in a SureBackup Job with the option to leave the Application Group enabled. If this fails as well, disable the PING test at the application group and start the VM.
2) Have a look at the VM why the VM looses it´s IP address (MAC address binding on LINUX systems? Outdated VMware tools (that we use to detect the IP)? ... Windows Patch level?...
Repeat this until you have solve the issue in this test environment and you can ping it with the masquerade address. Then apply needed changes/fixes in the production VM (and repeat)...
If it is just a timing issue and the VM will respond on the IP just too late for the usual SureBackup timeout, you can define in the SureBackup Job / Application Group to wait longer before testing for IP pings.
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

Thanks Egor, Andreas,

I hadn't thought of adding the VM into the Application Group. This morning I added a second domain controller to the application group (just in case that helped; it didn't). Then I added the VM that was failing the ping test into the Application Group, turned off the ping test component, then set the Virtual Lab to keep the application group running. After everything started up and settled down, I was able to login to the DCs and I could successfully ping the VM that was failing the ping test. In that VM itself, the firewall was running the Domain profile.

Since other VMs in the SureBackup job in the same network subnet were NOT failing, I can now reasonably conclude that this is just a timing issue and that the VM needs a bit more time to set itself up properly to respond to pings. It may just be the Network Location Awareness service being slow to respond on that VM.

I will try extending the timeouts by a couple of minutes and see if that helps.

Cheers,
Steve
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

Well that's disappointing. When I include the VM in the Application Group and leave everything running at the end of the job, I can ping the VM from the DCs inside the SureBackup job (i.e. I login to the DC console, then ping from there) and I can ping the proxy address of the VM from the Veeam Backup Server. But as soon as I have the VM outside of the application group and just as per normal have it as a VM included into the SureBackup job, it fails the Ping test again.

I have extended the Timeouts significantly and this hasn't fixed it. I've checked and VMware Tools is up-to-date. VM is getting a non-Domain firewall profile inside the SureBackup job and ping is therefore blocked.

I'll wait and see how next weekend's job goes and if it is still failing I'll open a ticket.
haslund
Veeam Software
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Error - running ping test

Post by haslund »

@Frosty just to make sure I understand correctly, you have a SureBackup job with an application group that contains a domain controller. The SureBackup job is linked to a backup job that contains a virtual machine. If the job runs then the linked job VM is getting a non-domain firewall profile. If you include it in the application group then it gets a domain profile?

A few questions:
1. How many virtual machines are there in the linked backup job?
2. How many virtual machines is the SureBackup job configured to test concurrently? If >1, could you lower this to 1 and run the job again (with the VM not in the application group)
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
Andreas Neufert
VP, Product Management
Posts: 6749
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Error - running ping test

Post by Andreas Neufert »

Now I see the issue Frosty,

it is related to the windows firwall. You can PING within the lab as it is seen as domain traffic. (less default firewall restrictions).
The communication from the "Masquerade" Subnet is seen from Windows as outside of the domain traffic and no PING is allowed.

For the PING test you need to allow PING (ICMP) for the other network profiles as well.
It is one of the points described here:
https://www.veeam.com/kb1067
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

@haslund -- you understand correctly -- SureBackup job, application group now with 2 x DCs (first marked as authoritative restore), linked to a single backup job. I am pretty sure that the problem is firewall profile-related, but cannot be 100% certain. Yes, if I include the VM in the application group instead, when everything starts up, eventually I can ping the VM from the DCs in the application group and when I login to the DCs and to the VM, I see a Domain firewall profile. To answer your specific questions: (1) there is only one VM in the backup job; (2) so I think this makes Q2 a non-issue.

@Andreas -- what I don't understand is why this wasn't an issue last week and the week before that and so on? Plus this is the same way I set up all our VMs here, with Windows Firewall turned ON and then an exception for Ping traffic inbound for the Domain profile (but not for Private or Public).

Windows Firewall rules is one of the things that I checked early in my troubleshooting, comparing this VM to other VMs from a Windows Firewall p.o.v. I am happy to accept that as the explanation I guess, but it isn't congruent with the data unless I am missing something else? I had read the KB article you mentioned a few days back:

https://www.veeam.com/kb1067

In that article, in relation to firewall rules and ICMP traffic, it says the following:
This can either happen because the VM blocks ICMP normally or because it’s within the isolated environment and the firewall has changed to the Public profile due to it not being able to communicate with a Domain Controller in the isolated network.
So my question would be: Why would the firewall change to the Public profile due to it not being able to communicate with a domain controller in the isolated network?i.e. because this hasn't been a problem in the past ... and in the main SureBackup job which has 20+ VMs and 10-ish backup jobs linked, all the other VMs ran okay and didn't have a problem communicating with the DCs in the isolated network ... it was only this one VM that is giving problems.

It is probably something dodgy going on just in that VM with the Network Location Awareness and/or the timing of how it all starts up. But if it can happen to this one VM, it can happen to any of them, and if that's the case I am going to have to modify firewall rules on all my domain-joined VMs. Not the end of the world, but a minor pain.
Andreas Neufert
VP, Product Management
Posts: 6749
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Error - running ping test

Post by Andreas Neufert »

I only can tell you that in nearly non of my tests or customer situation the domain profile was triggered reliably. Maybe Windows Update. New Hardware replacement, ... whatever.
haslund
Veeam Software
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Error - running ping test

Post by haslund »

@Frosty: Which roles did you assign to the domain controller in the application group? What are your application initialization timeout configured to for the application VM?
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

@Andreas -- I think I will allow Ping on all firewall profiles on all VMs then. Thanks for the feedback.

@haslund -- the 2 x domain controllers in the application group are set up as follows: first DC is set for Authoritative Restore, second DC is set for Non-Authoritative Restore, both DCs are also DNS Server and Global Catalog. Startup options for DCs are Memory=100%, Maximum Allowed Boot Time = 1500sec, Application Initialization Timeout - 600sec. Boot Verification = VM Heartbeat and also VM responds to ping.

For the application VM which is failing the ping test: no Role is selected (so no test scripts), Startup options are Memory=100%, Maximum Allowed Boot Time = 900sec, Application Initialization Timeout - 180sec. Boot Verification = VM Heartbeat and also VM responds to ping. I have tried increasing these values for startup options, but it didn't fix the problem. The application VM starts the ping test immediately after the VM Heartbeat is successfully detected, but after about 3 minutes it fails. Even if I set the Application Initialization Timeout = 600sec, it still fails the ping test within approx 4 minutes after the heartbeat is detected.

I'm going to wait and see whether it behaves properly when the normal run happens on Sunday evening. If it doesn't, I will open up ICMP ping traffic on all firewall profiles (I might constrain this to be just for our internal subnets).
Andreas Neufert
VP, Product Management
Posts: 6749
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Error - running ping test

Post by Andreas Neufert »

What @haslund wanted to say here is that for domain controllers there is a special logic that place them into restore mode, which in the end boot the server multiple times before it is up and running. The default VM boot time in the SureBackup/Application Group might not be enough.We suggest to add the Domain Controllers in the Application Group selected in the SureBackup Job and select Application checks with it (GlobalCatalog, DNS,...). That way we change the boot timing automatically. You can see this by selecting the different roles and verify the time settings.
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

Thanks Andreas, yes, I have those tests enabled in the Application Group for my DCs and they always boot and pass the tests successfully.
meggerz
Influencer
Posts: 12
Liked: never
Joined: Aug 25, 2016 7:10 pm
Full Name: Megan Gee
Contact:

Re: Error - running ping test

Post by meggerz »

Hi,

I had the same issue and it turned out to be that the VM in question doesn't start up on the domain firewall profile and IPV4 echo requests are blocked on the private firewall. If you allow an exception for IPV4 echo requests through this firewall it will solve the issue.

I tired delaying my NLA service and all types of other things but in the end allowing the exception worked.

Have a good day!
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty » 2 people like this post

Thanks all,
I've now updated all my VMs to allow Ping through the Public and Private firewall profiles (not just Domain). I would estimate that about 60% needed modifying. I've subsequently re-run the SureBackup job which was failing (on just one VM) and it worked fine. So confirmed that this resolved my issue.
Cheers,
Steve
haslund
Veeam Software
Posts: 839
Liked: 149 times
Joined: Feb 16, 2012 7:35 am
Full Name: Rasmus Haslund
Location: Denmark
Contact:

Re: Error - running ping test

Post by haslund »

@Frosty : Does the domain controllers happen to be in different subnets than the application VMs?
Rasmus Haslund | Twitter: @haslund | Blog: https://rasmushaslund.com
Frosty
Expert
Posts: 200
Liked: 43 times
Joined: Dec 22, 2009 9:00 pm
Full Name: Stephen Frost
Contact:

Re: Error - running ping test

Post by Frosty »

No, they're in the same subnet, all part of our main LAN subnet.
Post Reply

Who is online

Users browsing this forum: No registered users and 57 guests