ysm wrote:WAN acceleration will essentially increase the processing time of each backup job, and can only transfer each backup/replication job over the pipe one by one. While it decreases the amount of data sent over the pipe, the total time taken to complete the backup/replication job might not be faster, is that correct?
A qualified "yes." It may NOT take longer than doing it without acceleration, especially once the seed is done, but there is a risk.
ysm wrote:4h to backup/replicate is indeed a bit long and I am not sure if the management can accept that.
Individual VMs may (will probably) take less time, but if they're upset with it taking 4h but satisfied with a 24h RPO then there's a disconnect between their tolerance and their understanding. 4h transfer time is really only a concern when you're looking at RPO of <4h (or, as some would argue, <8h, because the currently-transferring restore point can't be considered valid, making the last restore point which was transferred *at least* 4h old when the current one is being sent. For that reason, you can instead send multiple restore points each day; each restore point will be smaller because less change will have occurred, so the time to get it transferred will be similarly scaled down).
ysm wrote:I am aware of the spinning up of VM directly from backup feature, but I am not entirely sure the requirements required to do that in terms of storage and compute. Based on your reply, it seems to me, for "instant restore", we will need to make use of the local storage of the host on top of the shared storage, is that right? If that is the case, then it would seems to me direct replication of VM makes a bit more sense in terms of resource requirements. We do intend to put a shared storage in our DR site and multiple hosts to support the spin up of probably at least 50% of the production VMs in the case of DR.
Compute for either Instant Restore or Replica is identical: it's the compute (CPU+Memory) required by the VM when running. While Instant Restore is running from the backup restore point, some "writable" storage--it can be local or shared--is required to act as a "delta disk" in conjunction with the restore point; the maximum required would be 100% of the source disk, but only in a worst-case scenario where every single block of the virtual disk is written to while running as an Instant Restore. Of course, migrating from Instant Restore into "production" (aka Full VM) does require sufficient datastore capacity to hold the VM.
Replication, holding a copy of the full VM--plus snapshots representing recovery points--can use local, but shared would be preferred so you don't have to directly manage both space and compute requirements: with shared, you only have to worry about compute, and if DRS is available, even that's minimal. The VMs you consider necessary to spin up during a DR event should be considered for replication; that will be the fastest failover, and sending multiple replicas each day will decrease your RPO from 24h to something appreciably lower.
ysm wrote:As for the tape library matter, the main consideration that we had is if we put our tape library at production site, is there a need to have a tape library at DR site as well for purpose like: seeding the initial DR setup, restore from tape under some circumstances? That would mean that we would need two tape drives, one at each location. That would no doubt increase cost. What's your take on this to address these issues?
Personally, I don't like relying on tape for anything other than archive; if I'm relying on a drive at the DR site, it's because not only Plan A failed, but Plan B (and possibly Plan C) failed as well. If you're getting replicas and backup copies to the DR site, then you're covered for an "instantaneous" disaster. You meet or exceed 24h RPO, and your infrastructure should provide a <1h RTO for 50% of your infrastructure--ie, the most business-critical functions. Keep in mind that the one of the most compelling reasons for tape as primary backup media--$$/TB--have pretty well been superseded by well-designed backup repositories comprised of spinning disk and deduplicating file systems. However, I have run into organizations that insist on having current backups on tape, even when using dedupe storage; in that case, we've convinced them that a bare drive (no autoloader) is an acceptable accessory at the DR site for reading tapes in case of "plan D" emergency while using a full automated tape library (sometimes with 2 or more drives) makes sense at the production site, closest to the source data. Yes, that scenario does impose the requirement of a second physical host for the DR-site tape drive, but that was significantly less expensive than a host+autoloader for both sites.
ysm wrote:To avoid the shoe-shining issue of tape, can we conduct the writing to tape after the replication/backup jobs are done?
Your tape drive should indicate the minimum streaming performance; that's the rate at which data must be fed to the tape in order to avoid shoeshine. I don't know of any Veeam-compatible (ie LTO3 or better) drives that have a minimum streaming speed under the maximum speed of your stated WAN capacity. You'll have the problem irrespective of other activity happening. The only reasonable way to use the tape library at DR would be to replicate 100% of your VMs, then backup the VM replicas (not the original VM) to a repository in DR, then copy the backup to tape. In that scenario, 100% of the backup activity is occurring at the DR site, but it's also going to lag behind production by at least the time between replication passes.
When considering tape, you must remember: Veeam will NOT backup directly to tape; you must first backup to disk, then copy the backup to tape. Further, you can only send a first-line backup to tape; you cannot send a backup copy to tape.