We're using VBO 365 to do backups of our Exchange Online mailboxes and I've recently been digging into the retention settings in it so it can mirror the retention in Office 365 we have set. Our retention is set to 2 years with the idea that it is legal retention - 2 years. No more, no less. When I noticed the 2017 repository not shrinking at all, however, I worked with support for a while and eventually learned some things about how the software evidently works that surprises me, and I'm still not completely sure if this is really the way it's intended to be.
So what I learned boils down to:
- Item level retention keys off of date modified rather than date created
- The last modified date of an item isn't visible to the admin which makes troubleshooting retention issues more painful than it needs to be
- There's no cleanup done on the database for a given year and items in it that fall out of retention are only marked as white space to be overwritten by subsequent backups
- There will never be subsequent backup jobs to that database because VBO 365 divides its repository by year, so by then it's actively backing up to a different year's database
- A yearly database won't fall off until every single item in it falls out of retention
All of these taken together, we run into a scenario where a hand full of emails can keep a database for a given year in place longer than expected. These emails may only be a few kilobytes, but they can effectively take up hundreds of gigabytes of disk space because items that fall out of retention are only marked as white space and will never be overwritten because the repositories are divided by year, so active backup jobs are writing to a different year's database by that point. Even though retention is 2 years, items can persist for much longer than that because it's based off of the last modified date. Because our retention policy in Exchange Online is 2 years (based on date received) and VBO 365's retention is set to 2 years (based on date modified), I can potentially have items that are just under 4 years old in the repository, which runs contrary to the idea of retention from a legal standpoint. This also magnifies the space issue, leading to multiple years' worth of databases persisting and taking up hundreds of gigabytes of space only to house a few kilobytes of emails because the database isn't dropped until every single item in it falls out of retention. Its as if the design of the repositories and the design of the item level retention feature are at odds with each other because the mechanism intended to reclaim space from expired items is broken by the way the repository is set up.
Support indicated this was normal, but it's still kind of hard for me to believe. We only have a few dozen employees and that's translating to potentially a few hundred gigabytes of wasted space both on the server that runs the software and on backup storage. If we had hundreds of employees it could be terabytes, and it seems a little bizarre to me that the software wouldn't do some sort of periodic cleanup of the database to wipe out empty space to avoid this. Is this really how the software is supposed to function, or does it sound like there's some sort of issue with our installation?