It's too late now, but the following happened today at a clients site and we still don't know if we did something wrong or if this behavior was unexpected. Simple Setup:
- A continous Replication job were configured from HV-01 to HV-02.
- Unfortunately, HV-01 suffered a hardware failure and repeatedly crashed. Since the VBR console was installed on HV-01, we were unable to perform a planned failover.
- To keep operations running, we manually powered on the replica VMs on HV-02 (so a manual failover). This worked and the company continued working for 30 hours.
The problem occurred once HV-01 was repaired and powered back on:
- As soon as HV-01 came online after hardware repair, the replication job immediately started again and overwrote the running VMs on HV-02 with the old state from HV-01 within seconds without any confirmation.
- It had already happened, faster than we could login to the HV-01 and open the Veeam Console for the first time
- This completely discarded the state and therefore all work performed on the replica VMs during the unplanned failover period. Unfortunately, 30 hours of work were lost.
- Is this behavior expected and did we something fatally wrong?
- Shouldn’t Veeam recognize that the replica VMs had been manually started / theor state had changed and especially avoid overwriting them without confirmation?
- Is there a best practice in such a situation to prevent data loss?
Joshua