Maintain control of your Microsoft 365 data
Post Reply
gingerdazza
Expert
Posts: 206
Liked: 14 times
Joined: Jul 23, 2013 9:14 am
Full Name: Dazza
Contact:

RTO and RPO for O365

Post by gingerdazza »

I wondering if anyone has examples of how to define RTO/RPO SLAs for O365? The RPO is relatively easy, but with RTO surely it depends on the scenario... if you're restoring 100's of mailboxes to O365 after a ransomware that may take days... but a relatively small subset of data might only take an hour. Just wondered if anyone had thoughts on how to express this in terms of an SLA, or even better would be some real world SLA examples.
Regnor
VeeaMVP
Posts: 1007
Liked: 314 times
Joined: Jan 31, 2011 11:17 am
Full Name: Max
Contact:

Re: RTO and RPO for O365

Post by Regnor » 1 person likes this post

For an email service, in my opinion, the RTO decribes the time to restore the service itself; so how long does it take to being able to receive/send emails again.
The data itself isn't necessarily needed directly after restoration of the service, so a restore can occur over a longer timeframe.
So for Exchange Online, you would need to define different scenarios and apply a RTO to them.
For example:
-Ransomware attack: Users can still work and Restore can run a few days without problems
-You delete/lose all Azure AD users including the mailboxes; how long will it take you to recreate those? (A Veeam 365 restore will only work of you have valid targets/mailboxes.)
gingerdazza
Expert
Posts: 206
Liked: 14 times
Joined: Jul 23, 2013 9:14 am
Full Name: Dazza
Contact:

Re: RTO and RPO for O365

Post by gingerdazza »

Thanks for your input...
In the cloud shared responsibility model though, surely the service-level RTO belongs largely to the service provider - in Exchange Online that would be Microsoft?

So, you think that RTO for O365 should be expressed as a range of scenario based RTOs?
i.e. individual items RTO < 1 hour (factoring in Service Desk ticket time!)
larger individual containers (i.e. mailboxes) < 4 hours
enterprise wide data recovery (i.e. mass recoveries) < 4 days

...kind of thing?

Wish I could see some real world examples of O365 RTO/RPO/SLAs

Would appreciate the opinions and insight of others too
gingerdazza
Expert
Posts: 206
Liked: 14 times
Joined: Jul 23, 2013 9:14 am
Full Name: Dazza
Contact:

Re: RTO and RPO for O365

Post by gingerdazza »

....no other thoughts from anyone on this subject?
Regnor
VeeaMVP
Posts: 1007
Liked: 314 times
Joined: Jan 31, 2011 11:17 am
Full Name: Max
Contact:

Re: RTO and RPO for O365

Post by Regnor » 1 person likes this post

Unfortunately I don't have any real world SLAs for O365 but I would handle it scenario based, just like you've said.
This also applies to any RTO for any service, it will always depend on the scenario/case.
Restore of a single item vs full container vs full service vs complete outage ...
Mike Resseler
Product Manager
Posts: 8191
Liked: 1322 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: RTO and RPO for O365

Post by Mike Resseler » 1 person likes this post

@gingerdazza

I think you are indeed on the right track. By working with different type of SLA's depending on the "task".

Do note that you could also agree on 1 specific item in case of a large disaster:

https://helpcenter.veeam.com/docs/vbo36 ... tml?ver=50

The last checkbox on that page gives you the possibility to agree on a certain amount of days you want to restore first. So imagine you need to restore 100 mailboxes, then restore the first 10 days of all those mailboxes before restoring everything else. This is a very helpful feature in a mass-restore scenario.
gingerdazza
Expert
Posts: 206
Liked: 14 times
Joined: Jul 23, 2013 9:14 am
Full Name: Dazza
Contact:

Re: RTO and RPO for O365

Post by gingerdazza »

Thanks Regnor/Mike. Appreciate your insight.
gingerdazza
Expert
Posts: 206
Liked: 14 times
Joined: Jul 23, 2013 9:14 am
Full Name: Dazza
Contact:

Re: RTO and RPO for O365

Post by gingerdazza »

OK, so here is a draft SLA for RTO and RPO that I've composed. Would welcome critique on it.

Resiliency/Availability
• Service availability for M365 services is as-per the Microsoft 365 Service Level Agreement. The M365 cloud service is designed to be resilient and highly available, but Microsoft provides no contractual recovery time objective (RTO) or recovery point objective (RPO) for data
• In the event of an outage to the M365 Exchange Online service (only), we would utilise <Product>Continuity Mode which will continue to provide access to email services for the duration of any outage to Exchange Online
• The SLAs provided for M365 data recovery will only cover M365 data that is protectable. Protectable data details are complex, and subject to changes as the M365 SaaS services evolves, but current protectable data is Exchange Online (all data), OneDrive*, Teams*, Sharepoint Online*, M365 Groups*, and Project Online*. *caveats, limitations and exclusions apply.
• Within the terms of this SLA “minor” data loss is determined as any volume of M365 data below 2GB (examples: single mailboxes, small volumes of data such as files, folders, conversations, etc.)
• Within the terms of this SLA “major” data loss is determined as any volume of M365 data above 2GB
• Within the terms of this SLA “catastrophic” data loss is determined as very rare/unlikely events which result in comprehensive or complete organisational data loss across M365 services


Recovery/DR
• In the event of data loss, data recovery will rely on M365 service availability as a pre-requisite before data recovery can begin, potentially impacting RTO SLAs
• Where “minor” data loss has occurred, the achievable RTO will be 4 hours. {I’m accounting for the time it takes to log the ticket, get assigned, and restore. In reality it’s a 10 minute affair}
• Where “major” organisational data loss has occurred, achievable RTO could be as high as 72 hours {Here I’m considering incredibly rare mass data loss events}
• Where “catastrophic” organisational data loss has occurred, and Microsoft are unable to facilitate a local data recovery natively, no particular RTO SLA is provided, but such an event may for guidance purposes take > 1 week to fulfil a complete recovery. Improved RTO SLAs could be provided in this scenario for prioritised data if this data is identified in advance of such an outage
• Protectable M365 data within Teams, Exchange Online, Sharepoint Online, M365 Groups, and Project Online will have an RPO of 12 hours. {We run backups 4 x a day but I like to allow for failures}
Mike Resseler
Product Manager
Posts: 8191
Liked: 1322 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: RTO and RPO for O365

Post by Mike Resseler »

I have read it 3 times now (I must say, you have the capability to write legal documents ;-)). To me it looks good and achievable. Final "advice" I would give you is to test regularly to see if you get that 72 hours in. (The 4 hours I don't think is an issue). Have 1 test mailbox dumped with fake emails, and do an entire recovery to see how long it takes. It will vary as it will depend on how much you will get throttled
Post Reply

Who is online

Users browsing this forum: No registered users and 20 guests