-
- Veteran
- Posts: 377
- Liked: 86 times
- Joined: Mar 17, 2015 9:50 pm
- Full Name: Aemilianus Kehler
- Contact:
Latest AD restore broken
Hey all,
I'm wondering if someone could have any idea to help me out. So I generally have a test environment of my production, this is done by basically rebuilding my production servers from Veeam backups. This gernally allows me two things
1) Developers and other admins and test changes without affecting production servers and can do so during production hours.
2) Ensures my backups are working as intended.
So it seems #2 is a great example here. So after numerous failed updates on one of our workflow server that integrates with SQL, and SharePoint quite heavily. I decided to rebuild the test enviro (Generally I'd just restore the DB the Work flow uses, and then just restore that workflow server frontend) but this time I decided to completely rebuild. Which is something I've done several times before without issues, this time was different.
So I did my usual restored the 3 DC's I have for my primary forest/domain. Logged into each, and verified I could ping the others (I didn't do further AD test as generally if they can communicate their services came up without issue). Then I connected my first member server, to my dismay it stated it couldn't locate any logon servers.
So I logon as a local admin and notice the NIC settings all reset. Searching this I quickly discovered the known bug with VMware VMXNET3 and Windows client template bug, so I installed the MS hotfix, did a fresh backup and restored, again to find the same error... Logging in again as local admin I found the NIC this time was perfectly fine and was configured like it was suppose to be. Being able to ping my DCs by name, I decided to try the rejoin domain trick, which reported no DC's available to service the request. Which made me feel that AD was actually broken...
So I log into my PDC to my dismay replication test told me failure due to DNS (uhhh what?)...
Checking DNS service fails to connect to local DNS service error logs showing Error 4xx Can't bind socket to IP address (local address (not loopback)). Uhhh OK? Any attempt to restart the service would fail or hang, until I changed the DNS setting on it's NIC to point to itself instead of the neighboring DC..
Then "Net stop dns" and "Net start DNS" It would work (This at least worked the first time, the second test it didn't and I had to actually delete the Nic, re-add it and reconfigure the IP address) I'd have to do this on both 2008 R2 DC's, the 2012 DC didn't seem to be affected.
After enough time and clearing the event logs, I'd eventually get an all clean "repadmin /showrepl" and "repadmin /syncall"...
Yet to my dismay member server STILL couldn't logon with the same error....
Checking dcdiag all DC's were complaining that no GC's were available and the PDC role was unavailable (At this point it was like 3 AM and I was getting furious...)
Checking all sources online about NTP and DNS everything was fine, so I shut em down and went to bed...
Next morning restored again and tried again, same erros (that was how I noticed the two different DNS fixes)... Since I didn't remember these issues the last time I ran though this I decided to boot up older copies of my AD by simply pointing to an older backup point in Veeam.
Sure enough those AD restores came up clean at least I'd get a statement stating the computer relationship is broken (to be expected with a newer member server and and older AD, which was easily fixed with "netdom /resetpwd") which worked as expected.
So anyone? What happened??!?! why are my most current backup of AD broken??
The only change I can think of is that I upgraded from 9.0 to 9.5 but that was so smooth and trouble free, this can't be the cause can it?!
I'm wondering if someone could have any idea to help me out. So I generally have a test environment of my production, this is done by basically rebuilding my production servers from Veeam backups. This gernally allows me two things
1) Developers and other admins and test changes without affecting production servers and can do so during production hours.
2) Ensures my backups are working as intended.
So it seems #2 is a great example here. So after numerous failed updates on one of our workflow server that integrates with SQL, and SharePoint quite heavily. I decided to rebuild the test enviro (Generally I'd just restore the DB the Work flow uses, and then just restore that workflow server frontend) but this time I decided to completely rebuild. Which is something I've done several times before without issues, this time was different.
So I did my usual restored the 3 DC's I have for my primary forest/domain. Logged into each, and verified I could ping the others (I didn't do further AD test as generally if they can communicate their services came up without issue). Then I connected my first member server, to my dismay it stated it couldn't locate any logon servers.
So I logon as a local admin and notice the NIC settings all reset. Searching this I quickly discovered the known bug with VMware VMXNET3 and Windows client template bug, so I installed the MS hotfix, did a fresh backup and restored, again to find the same error... Logging in again as local admin I found the NIC this time was perfectly fine and was configured like it was suppose to be. Being able to ping my DCs by name, I decided to try the rejoin domain trick, which reported no DC's available to service the request. Which made me feel that AD was actually broken...
So I log into my PDC to my dismay replication test told me failure due to DNS (uhhh what?)...
Checking DNS service fails to connect to local DNS service error logs showing Error 4xx Can't bind socket to IP address (local address (not loopback)). Uhhh OK? Any attempt to restart the service would fail or hang, until I changed the DNS setting on it's NIC to point to itself instead of the neighboring DC..
Then "Net stop dns" and "Net start DNS" It would work (This at least worked the first time, the second test it didn't and I had to actually delete the Nic, re-add it and reconfigure the IP address) I'd have to do this on both 2008 R2 DC's, the 2012 DC didn't seem to be affected.
After enough time and clearing the event logs, I'd eventually get an all clean "repadmin /showrepl" and "repadmin /syncall"...
Yet to my dismay member server STILL couldn't logon with the same error....
Checking dcdiag all DC's were complaining that no GC's were available and the PDC role was unavailable (At this point it was like 3 AM and I was getting furious...)
Checking all sources online about NTP and DNS everything was fine, so I shut em down and went to bed...
Next morning restored again and tried again, same erros (that was how I noticed the two different DNS fixes)... Since I didn't remember these issues the last time I ran though this I decided to boot up older copies of my AD by simply pointing to an older backup point in Veeam.
Sure enough those AD restores came up clean at least I'd get a statement stating the computer relationship is broken (to be expected with a newer member server and and older AD, which was easily fixed with "netdom /resetpwd") which worked as expected.
So anyone? What happened??!?! why are my most current backup of AD broken??
The only change I can think of is that I upgraded from 9.0 to 9.5 but that was so smooth and trouble free, this can't be the cause can it?!
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Latest AD restore broken
Hello, please provide Veeam support case ID for this issue, as requested when you click New Topic. Thank you!
-
- Veteran
- Posts: 377
- Liked: 86 times
- Joined: Mar 17, 2015 9:50 pm
- Full Name: Aemilianus Kehler
- Contact:
Re: Latest AD restore broken
Contacting Support this morning. Will provide details as requested. Thanks
-
- Expert
- Posts: 114
- Liked: 25 times
- Joined: Dec 09, 2012 3:50 am
- Full Name: Jim Millard
- Contact:
Re: Latest AD restore broken
For what it's worth, I had similar experience today with a Server 2016 restore. The original VM was messed up thanks to a Windows update that didn't want to install correctly; after trying several troubleshooting steps, I ended up giving up and trying to restore from a known-good backup.
Symptomatically, the VM seemed to restore properly, but it stayed in AD Restore Mode instead of automatically restarting after doing a non-authoritative restore.
Manually restarting the VM brought it out of restore mode, but I still had issues with replication & security. Ended up having to force replication from a known-good DC and restarting the guest one more time.
In previous versions (of both VBR and Windows), the restore of a DC was effortless & automatic. I'm not sure if this is related to the new version of Windows or VBR. I don't/won't have a support case, however, because this environment is backed by an NFR copy of VBR, and I ultimately fixed the problem manually.
Symptomatically, the VM seemed to restore properly, but it stayed in AD Restore Mode instead of automatically restarting after doing a non-authoritative restore.
Manually restarting the VM brought it out of restore mode, but I still had issues with replication & security. Ended up having to force replication from a known-good DC and restarting the guest one more time.
In previous versions (of both VBR and Windows), the restore of a DC was effortless & automatic. I'm not sure if this is related to the new version of Windows or VBR. I don't/won't have a support case, however, because this environment is backed by an NFR copy of VBR, and I ultimately fixed the problem manually.
-
- Veteran
- Posts: 377
- Liked: 86 times
- Joined: Mar 17, 2015 9:50 pm
- Full Name: Aemilianus Kehler
- Contact:
Re: Latest AD restore broken
Thanks for the additional info Jim.
In my case I was fighting enough with the latest restores, thing I don't get is why my old ones worked without intervention but my latest est one I do have to intervene.
Oddly enough, on my second run DNS did bind to their respective local IPs just fine, but appeared they were all confused. First things...
1) Verified all DNS services are up n running on all DCs. (DNSCMD (Core, or DNS snap-in Desktop Experience)
2) Ensured all DC's were replicating fine (Repadmin /showrepl or Repadmin /showreps) (Sites n Services will not open or load while the primary domain is seen as "down").
3) At this point you may notice that netlogon and SYSVOL are not operational, along with this you'll see the following while doing a dcdiag:
"Starting test: FsmoCheck
Warning: DcGetDcName(GC_SERVER_REQUIRED) call failed, error 1355
A Global Catalog Server could not be located - All GC's are down.
PDC Name: \\xx.xxxxx.xxxx ( servername and domain are correct)
Locator Flags: 0xe00001fd
Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
A Time Server could not be located.
The server holding the PDC role is down.
Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
A Good Time Server could not be located.
Warning: DcGetDcName(KDC_REQUIRED) call failed, error 1355
A KDC could not be located - All the KDCs are down."
In this case, I managed to get my AD back up by doing an authoritative restore on the PDC:
"The fix from Microsoft Support was to copy the two folders (policies/scripts) back from C:\windows\sysvol\sysvol\domain.local\NtFrs_PreExisting___See_EventLog to the C:\windows\sysvol\sysvol\domain.local\ folder, then stop the NTFRS service, then set the BurFlags key to D4, which does an Authoritative restore." Source: https://community.spiceworks.com/topic/ ... 2k3-domain
This is also covered by Andrew from Veeam on his Veeam Blog post here: https://www.veeam.com/blog/how-to-recov ... ction.html
Also this KBhttps://www.veeam.com/kb2119 covers it as well, I just get why I didn't have to do this before...?
Yay I was able to get all my other servers and services running once AD was back up. Any help from other while I wait on Veeam support to get back to me on what I might be able to change in my restore procedure to avoid having to do all this manual overhead work? Don't get me wrong it was a great DR test case and I'm better off from it.... but still was a tad scary when relying on my backups to work
In my case I was fighting enough with the latest restores, thing I don't get is why my old ones worked without intervention but my latest est one I do have to intervene.
Oddly enough, on my second run DNS did bind to their respective local IPs just fine, but appeared they were all confused. First things...
1) Verified all DNS services are up n running on all DCs. (DNSCMD (Core, or DNS snap-in Desktop Experience)
2) Ensured all DC's were replicating fine (Repadmin /showrepl or Repadmin /showreps) (Sites n Services will not open or load while the primary domain is seen as "down").
3) At this point you may notice that netlogon and SYSVOL are not operational, along with this you'll see the following while doing a dcdiag:
"Starting test: FsmoCheck
Warning: DcGetDcName(GC_SERVER_REQUIRED) call failed, error 1355
A Global Catalog Server could not be located - All GC's are down.
PDC Name: \\xx.xxxxx.xxxx ( servername and domain are correct)
Locator Flags: 0xe00001fd
Warning: DcGetDcName(TIME_SERVER) call failed, error 1355
A Time Server could not be located.
The server holding the PDC role is down.
Warning: DcGetDcName(GOOD_TIME_SERVER_PREFERRED) call failed, error 1355
A Good Time Server could not be located.
Warning: DcGetDcName(KDC_REQUIRED) call failed, error 1355
A KDC could not be located - All the KDCs are down."
In this case, I managed to get my AD back up by doing an authoritative restore on the PDC:
"The fix from Microsoft Support was to copy the two folders (policies/scripts) back from C:\windows\sysvol\sysvol\domain.local\NtFrs_PreExisting___See_EventLog to the C:\windows\sysvol\sysvol\domain.local\ folder, then stop the NTFRS service, then set the BurFlags key to D4, which does an Authoritative restore." Source: https://community.spiceworks.com/topic/ ... 2k3-domain
This is also covered by Andrew from Veeam on his Veeam Blog post here: https://www.veeam.com/blog/how-to-recov ... ction.html
Also this KBhttps://www.veeam.com/kb2119 covers it as well, I just get why I didn't have to do this before...?
Yay I was able to get all my other servers and services running once AD was back up. Any help from other while I wait on Veeam support to get back to me on what I might be able to change in my restore procedure to avoid having to do all this manual overhead work? Don't get me wrong it was a great DR test case and I'm better off from it.... but still was a tad scary when relying on my backups to work
-
- Veteran
- Posts: 377
- Liked: 86 times
- Joined: Mar 17, 2015 9:50 pm
- Full Name: Aemilianus Kehler
- Contact:
Re: Latest AD restore broken
Case # 02562515
is being closed cause even though something CEARLY changed, it changed to what is considered to be normal behavour even though I really liked the non-normal behavior as it allowed me to spin up my test environment that much faster...
is being closed cause even though something CEARLY changed, it changed to what is considered to be normal behavour even though I really liked the non-normal behavior as it allowed me to spin up my test environment that much faster...
-
- Veteran
- Posts: 377
- Liked: 86 times
- Joined: Mar 17, 2015 9:50 pm
- Full Name: Aemilianus Kehler
- Contact:
Re: Latest AD restore broken
I found I'm having the same issue with a brand new replication job I just tested with AAP enabled... :S pretty annoying to have to do an authoritative restore manually whenever I enable AAP on my DC's. Is this really be design?
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Latest AD restore broken
Authoritative restore is not generally required (except for situations described in the referenced KB and blog post). Are you saying automatic restore didn't work in your case?
Who is online
Users browsing this forum: No registered users and 38 guests