
Today a repository is a space on which to build backup files, full and incremental. What if we never built those files, but only saved the CBT blocks as they were read from the source? One block in each file (which could be named for its hash value). A block which is already in the repository would not be saved again. A block no longer in use would be deleted (by some cleanup job). No need for forward/reverse or indeed any transformations at all. Metadata would be saved on the side and vmdk's could be reconstructed from blocks if needed. I know CBT blocks have variable size but even today we have to re-read the whole disk when the size changes, so that wouldn't be much different. If a backup was interrupted and must be retried, it would benefit from the blocks from the previous run already being in the repository. Any unused blocks would be found and deleted by the cleanup job. This repository would also replace the SOBR, since it wouldn't be limited to a single server. The database keeps track of what is where. A job would be able to write to multiple servers in the same repository - a nice performance bonus. For added resilience we could define policies that each block must be saved in two or three copies (on different servers) as long as we had the disk space. That policy could also be configured to sacrifice extra copies rather than letting the repository fill up, and give a warning in the console that it's time to add another server.
Pros:
No transformations - increased performance and less risk of corruption.
No giant vbk files to handle.
"Automatic" block deduplication over the whole repository (even if only for same-sized blocks).
Multiple copies without copy jobs.
Easy scaling!
Cons:
Larger database to keep track of all blocks.
Loss of the database means loss of the backup data (but this is also true for some other backup products).
Impossible to just copy a vbk out of a repository and import it somewhere else.
Compressing/encrypting all blocks separately will be less efficient.
Things like instant recovery would have to work differently, but perhaps not much.
How about that?