Interested to have people thoughts on the following:
For aggressive RTO and RPO, oftentimes we deploy highly available, highly replicated, highly automated failover. Think for instance about a stretched synchronous cluster for VMs. In theory that would achieve very aggressive RTO/RPO (sometimes 0).
And of course, you'd still backup those VMs with Veeam.
But then you think... my Veeam backups need to have tertiary backup to tape and taken off site - for archive and for air-gapping.
So, what RTO/RPO in this scenario have we actually achieved?
From a design stand point we have DESIGNED the platform for RTO/RPO 0 in site failover scenarios.
But if something goes wrong with the entire compute platform we have to run full restores of all VMs from Veeam backup. As good as Veeam is, that could be 8 hours (say). So, say RPO 12 hours and RTO 8 hours.
And if you've been ransomwared, and your Veeam backups are also encrypted, you may need to retrieve tapes, restore tapes data to Veeam, and then run a further 8 hours of Veeam restore.
How do we express this to the business in terms of RTO/RPO? Do we say it has RTO/RPO of < 1 hour (but qualify this based on scenario)? Or do we say, the worst case scenario could be a 24 hour plus recovery so we call I tout as an RTO of 24 hours?