We have tree (and in a few weeks four) DCs: 1xphysical and 2x virtual I am usin VSS Backup to ensure the Backup of the Domain Controllers is consistent. The problem i am experiencing is that when i do the Backup I get this warning in my Eventlog:
The DFS Replication service is stopping communication with partner DC03 for replication group Domain System Volume due to an error. The service will retry the connection periodically.
Additional Information:
Error: 9036 (Paused for backup or restore)
Connection ID: 365E6FF3-8654-4A0D-970E-A43AB77F1DDB
Replication Group ID: 7117FDA8-DCE7-4A68-ADAE-E1BFC177A986
The DFS Replication service encountered an error communicating with partner DC03 for replication group Domain System Volume.
Partner DNS address: DC03.--.--
Optional data if available:
Partner WINS Address: DC03
Partner IP Address: --
The service will retry the connection periodically.
Additional Information:
Error: 9036 (Paused for backup or restore)
Connection ID: 365E6FF3-8654-4A0D-970E-A43AB77F1DDB
Replication Group ID: 7117FDA8-DCE7-4A68-ADAE-E1BFC177A986
It seems that there is some kind of lag occurring when i do my backup.
Is anyone else having these kind of problems? Can something be done? I thing this error is not that bad because after the backup the replication works fine, but this is spamming my event-logs and my SCOM always gives errors because of this.
I believe the reason why you have this message constantly appearing in SCOM is that your backup job is configured to use VSS, which means that all API calls will not be processed while your DC is in frozen state. This is required to ensure application consistency while snapshot is taking place.
Besides, if you take a closer look at the event log, you will notice that the communication between your DCs is stopped because of this:
this makes a lot of sense and I can understand why this is happening (i also noticed the message concerning the "Paused for Backup..."), the Problem is that I do not understand why this is supposed to be a error. I think maybe I will have to live with this issue.
If you're able to parse event text in the monitoring system, then you can exclude this event when you see "backup" word. This should allow you to keep monitoring your environment in the way you do now.