I spoke to a technician at Veeam a couple of days ago, and asked him about if/when the backup storage experiences bit rot or similar corruption. In short terms, I asked him how Veeam deals with the situation where say you have a very big VMDK for a file server that is backed up, and there is some unnoticed corruption in the data areas of that file (the backup copy of the VMDK, i.e. Veeam's files).
More specifically, lets say the VM have a 2TB large VMDK with an NTFS volume in it. The volume has a lot of data in it as it is for a file server. Now lets assume that somewhere in this backed up VMDK (i.e. in the backup storage) some corruption occurs. A byte or two are corrupted, at the "place" of an important file in the fileserver. I am then wondering what means one has to detect this, if using Veeam to back up this VMDK? I am not sure if Veeam has any checksumming in its backup storage.
The technician said the following: What you have to work with is essentially the SureBackup, or as we specifically talked about, the Instant Restore way of testing it. He summed it up as "You can fire up the VM right from the backup storage, and what we do/Veeam does is to simply check if all parts of the VM mounts and can be used successfully. You can also run some additional checks inside the VM via scripting". This is all to my understanding as well, so far nothing weird.
However, when I asked about confirmation that Veeam won't "scan" or checksum the entire VMDK in the backup storage to find potential block/bit corruption further down the VMDK (i.e. in the pure data areas, which no ESXi/VM OS/application/whatever reads until the VM actually runs and someone asks for the file that is specifically stored at that place in the VMDK/filesystem), he didn't give a clear answer to this. He kept insisting that "if we can mount it and it starts up successfully, then everything is alright", which I think sounds very off.
I don't know the internals of VMDK or how Veeam works, but isn't it true that just firing up the VM using for example Instant VM Recovery and seeing that it runs is not an indicator that there isn't any corruption in the data areas of the corresponding VMDKs? I fail to see how without having checksumming and scanning the entire backed up data, Veeam would be able to determine that the backups are indeed intact.
The indirect question apart from the above is of course your opinion on what means there are to make sure that Veeam backups are indeed healthy. Is there something one can do using Veeam, or is it entirely up to features in the underlying storage to detect corruption (or alternatively doing a full check on the data inside the VM when doing test restores)? One could always run it atop ZFS but there's definately a lot of people that don't do that.
Please let me know if I need to clarify. Thanks!
