Comprehensive data protection for all workloads
Post Reply
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Ransomware discovery time and retention times

Post by JaySt »

i had some discussion on the matter of retention times in context of protection against ransomware/insider attacks. Setting up a hardened repository and making data immutable for a certain amount of time is great stuff to implement. But how long is long enough?
I can remember some research or reports stating there is an "average" discovery time of ransomware / encrypted data , but i cant seem to find the articles anymore.
What would people here recommend for setting retention times for specifically protecting against these attacks?
I'm thinking about something arround 14 - 31 days of immutable daily backups at minimum. Any shorter has higher chance of matching an attackers' maximum amount of patience for example, to just let data expire on the repo before taking action perhaps.
Or are you seeing pretty short detection times in your experience and the most recent backup was used most of the time to recover?
Veeam Certified Engineer
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Ransomware discovery time and retention times

Post by soncscy »

Interesting question actually, but I actually think an unnecessary question personally as this is a pretty rare time when every step of the backup's life time can be immutable now, so I would propose: however long you need to get to the next immutable step.

Whether you go tape or S3 for your immutable needs, my point of view would that on-disk __needs__ to be immutable long enough to get the data to your secondary (also immutable) location and long enough that you get your archival points.

I would personally avoid specific time-frames since the immutable values are easily retrieved, and it's easy enough for a ransomware attacker to devise an attack to disable immutability, wait till you're not immutable, and then get you when you're vulnerable on your primaries; so from my POV, you need to keep yourself immutable long enough to get your archival/secondary immutable backups created.

I guess my assumptions are the following:

1. Eventually, any datacenter will be compromised
2. Eventually, the local on-disk backups will be compromised (e.g., disable immutability, format the drive at the OS level as root, etc)
3. The only possible way to protect would be air-gapped backups or compliance-immutable backups on S3

So, knowing this, the best option as I see it is not to worry about the time you keep immutable on primary, that will be violated eventually; instead, focus on how fast you can get your data to a truly immutable storage. If you cannot afford S3/a fast enough pipe to get to S3, then rotated drives or tapes are a must; tape in particular is worth the investment!
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: Ransomware discovery time and retention times

Post by JaySt »

ok but in that way you would link primary, with secondary backups and even archival requirements to all be required for proper protection for ransomware/inside attacks.
it will be a big debate on what is a "truly" immutable storage. Looking at the environments i deal with right now, any immutable is a "win" compared to how it's done now.

In alot of cases i see, especially with Veeam v11, customers want to bolt-on immutable storage to their existing data protection environment. This is fine and properly motivated. But retention times come up pretty quickly when choosing any type of immutable storage. If it was tapes going to an offline vault on a daily bases, it would be the question of how long those tapes would stay in the vault before returning to the library. I see customers with daily tapejobs, but not do daily offline handling, especially during these COVID times. Enter v11 hardened repositories, for example, things will be easier but must be sized etc. and same questions apply.
Veeam Certified Engineer
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Ransomware discovery time and retention times

Post by soncscy » 4 people like this post

ok but in that way you would link primary, with secondary backups and even archival requirements to all be required for proper protection for ransomware/inside attacks.
it will be a big debate on what is a "truly" immutable storage. Looking at the environments i deal with right now, any immutable is a "win" compared to how it's done now.
Wall of text incoming as I sip my morning coffee ;) This is all of course just game theory, please don't take this as some gospel, but this is all stuff I've considered and experienced with clients.

Sorry, but it's not possible to answer your question directly because different systems have different setups. You need to think through your budget, the size of the data you're working with, and the discipline of the staff working it. From what you're telling me, I have concerns about your clients actually following procedure :/ This isn't a judgement on you of course, but my experience with my clients is that lack of discipline more than anything is the biggest risk, not hackers or malicious insiders.

But, since you asked, I personally have qualms about S3 providers since I'm a bit old-fashioned and I see this as renting someone else's storage, and now I have to consider their demands for my disaster recovery; I really don't want to be at risk of being locked out of my backups during a disaster just because someone forgot to pay the AWS bill or because some other provider is having network issues, lied about capacity, etc.

Theoretical Setup (budget-less)

If I imagine a perfect setup, I want the following:

1. 14 days Primary Backups to an Immutable Repository built on XFS to use block cloning in a SOBR; 14 days is chosen because we need to use forward incremental anyways, and block cloning basically makes this a forever chain size-wise. If storage permits it, add in GFS on the primaries so that S3 potentially has archival points.
2. 7 days Immediate Backup Copy to Rotatable drives of the Primary Backups (non-GFS); the goal of this copy is to have a short-term series of drives I can just unplug and plug-in on the main data center site as necessary.
3. 14 days simple retention, GFS as desired Immediate Backup Copy to another off-site Immutable Repository SOBR; on a weekly basis, plug-in a non-connected storage device and move the archival points via rsync to the storage, and disconnect it.
4. Tape-out from either the Backup Copy from point 3 or the source primary (I don't like over-working the primary backup though). Tapes must be exported daily after use and must not sit in slot. There must never be a tape with backup data in slot unless you're performing restores. Also, must be real tape -- VTL solutions are not applicable here (see Considerations)
5. On both SOBRs, offload to any S3 with proper immutable using both Copy and Move mode, moving only chains older than 7 days.

Considerations
  • Lots of storage needed here, so this has trouble scaling I would say, at least for the rotated drives. For more than 8 TB of backups, probably such drives need to be replaced by a storage array that gets physically disconnected from the network. Remote deadman switches are a thing and should do just fine here. WOL is unreliable even to this day (you never know how Vendors will implement it...) and software solutions aren't valid because of the points during the attack analysis below.
  • S3 for any scale basically necessitates a Gigabit or higher pipe; trying to do more than a TB with < 1 Gbit is too risky and you end up having to struggle with your connection to be protected; if you cannot dedicate a 1 Gbit line, probably S3 is not valid for you (AWS Snowball Edge and Azure Databox can help mitigate this though, but then you need to check the size of your increments)
  • Don't underestimate tape; while it has a large up-front cost, you own the device, the cost per GiB for storage is fantastic, typically they have great service plans with the vendors, and they're perfect because by default they are air gapped solutions. VTLs do not qualify as they're software solutions that exist as part of a bigger storage; when the storage is overtaken, so is the VTL. Don't buy into the Vendor hype, VTLs are not useful!!! (Also, it's time to be done with AWS/Azure VTL -- they are no longer useful not that there is a path in Veeam to Glacier, so do not consider these as options anymore)
Threat Factors

>ransomware/inside attacks

Let's discuss these both separately as they're completely different attack vectors and you need different approaches.

Ransomware

I will provide several premise points that I feel are inarguable and must be considered when designing:

1. Ransomware will happen, a given site will be broken into
2. Ransomware attackers will get full access to everything; assume they have admin/root to any system, so any software solution is compromised
3. Ransomware attackers will wait until they're able to apply the highest amount of pressure to ensure a payment
4. Ransomware attackers have every incentive to attack everything

Let's take these one by one. The first point is more just to make sure that my clients are understanding they are going to be attacked eventually, and that there is no such thing as a perfectly secured site (not even an offline one -- Stuxnet proved this to us years ago that a dedicated attacker will get in, so it's easier to just assume all sites will be compromised)

Second point sounds like it's a direct attack against the Immutable Repo idea, but this is spelled out by Veeam in a few places I've seen, so we know this is a heavy compromise point; however, knowing that only a single account needs access, we can set up a few safe-guards on these repositories to alert us as soon as we're compromised. Because we know which account __should__ have access, and that this account should not have root access, let's disable basically all access to this machine; no SSH access, the only accounts that should exist are root and our dedicated account (I'd even maybe make a few dummy accounts to act as alarms/honeypots), and set up as much alerting on the server as possible whenever these accounts log in. And I mean everything should light up; dozens of pager alerts/notifications/SMS, if any login is detected, it must be seen here. Similarly, any account activity from the backup service account must alert if it's done outside of the backup window. This maybe makes restores a bit annoying as someone will check for every restore, but again, how serious do you want to be? Similarly, since the attack vector is any command using the chattr command to remove the immutable flag, we report on this as well and run some other tooling to look for if the flag gets removed.

None of this will __protect you__ from the attacker, but that's not our goal here; the goal is that we want to know as soon as possible when we're potentially attacked and to have a disaster plan ready if that happens. Dis-connect all air-gappable storages immediately and go into full lockdown.

The third and fourth point we can kind of combine, and with proper discipline on our air-gapped backups, we can do our best to reduce this attack surface as much as possible. Tapes must be exported daily; as above, any tape with backup data must not stay in slot ever, and rotated drives must be physically disconnected as soon as they're done receiving their copy.

The Ransomware Attackers are going to wait for a slip-up or wait for a point when they can catch a mistake, or if there isn't a great point for any of that, they'll just attack when it's most convenient and they have the highest damage factor. But they can be very patient I suppose.

So, for Ransomware attackers, we must just assume that the primary backups __WILL BE COMPROMISED__; as is such, I want to design a strategy where it's assumed we lose them, hence multiple backup copies, and daily tape-out of the backups. The immediate copy from Offloads gives us another alternative for protection, so we now have 3 copies; one on an immutable storage (S3), two on air-gapped backups. Our risk factor for the first is costs and the provider itself failing in someway, our risk factor for the latter is physical failure and lack of discipline from our staff. Since Backup Copies can also move transaction logs now, even for our databases, we should have extremely good RPO/RTO for our most critical servers.

Inside attacks

This actually gets really tricky to be honest, as it's just too unpredictable. The problem with malicious insiders (intentional or not) is that they have no real attack pattern or incentive except to ruin as much as possible, again either intentionally or otherwise. Our redundancy which protects against Ransomware Attackers benefits us here also, but I guess I come up with a few new assumptions about a malicious insider:

1. They will know all of our processes; whatever our organization collectively knows, assume the malicious Insider knows
2. The malicious insider has physical access to any device in the org
3. The malicious insider has no goal besides as much destruction as possible

The first point is extremely tough; as mentioned before, discipline is one of the biggest failure points to consider, and a malicious insider will know our processes and personnel and will undoubtedly be able to convince them to violate procedure (e.g., malicious insider will convince the responsible person to allow the insider to handle the rotated drive exchange and tape out, will convince the Infrastructure Admin to let them into the Data Center, will get a privileged account on S3, etc). Also, they will know about the reporting on our Primary repo and should determine that the best attack is a physical one outside of the OS, meaning our reporting and honey pot are useless.

With point two, this means that subterfuge is very probable; it's very convenient and easy to break tape hardware, break USB busses, and to quickly put us in a place where we're stuck with just primary backups. The good news is off-siting the backups mitigated this, but the RPO/RTO is impacted. S3 also helps here, but not if the malicious user gets access; not that they can delete the backups (at least they shouldn't be able to unless there's an exploit), but they can attack in other ways, like malicious behavior to shut down the account. As is such, S3 accounts must be VERY restricted and on an as-needed basis only. Preferably, we should have a dedicated S3 account used exclusively for backups, and the passwords should be programmatically rotated with a password manager frequently, so that not even the S3 Admin knows which password is in use. This is still an attack vector for our malicious insider (and ransomware attackers), but for the latter, while it's a possible vector, I'm not sure I've seen it before, but possible still. Account and billing warnings/controls should again help us here.

Physical access is the scary part for me as we can only __react__ to this, we cannot proactively protect against it easily. Even with cages in the data center, again the first point assumes that a dedicated malicious insider will either manipulate their way into the cages or if they're dedicated enough, just cause mass physical damage (e.g., just take a sledge hammer, crude explosives/fire, or even just destruction with a car or something).

A malicious insider can get __VERY__ far in and because their goal is as much damage as possible, we're heavily reliant on our off-site backups. We MUST stress a heavy discipline on the off-siting and rotating as much as possible. Tape safes, different persons responsible for different rotated drives, and S3.



So, far too many words for a Sunday morning (afternoon now as I finish writing), but hopefully some food for thought.
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: Ransomware discovery time and retention times

Post by JaySt »

Wow, that's more than i could hope for! Thanks for this!
From what you're telling me, I have concerns about your clients actually following procedure :/ This isn't a judgement on you of course, but my experience with my clients is that lack of discipline more than anything is the biggest risk, not hackers or malicious insiders
I agree. But i also see it as fact of life and needs to be taken into the equation for a business. A lot of customers have invested in tape libaries. Leaving tapes in slots etc. is something that i just not see being handled better in the future. Human intervention and handling is a risk. Having a fully automated reliable process to reach a more acceptable (but still waaaay better) level of protection is more what i would be looking for if you ask me. I think things like S3 immutable/Veeam hardened repositories are a "big leap for man kind" within a lot of organizations.
I see your ideal protection strategy as one that can be a set as a goal, but a goal that's (or will be) rarely reached in practice.

thanks alot for your insights! Enjoyed reading it.
Veeam Certified Engineer
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Ransomware discovery time and retention times

Post by soncscy » 2 people like this post

You're very welcome, I'm glad you liked it :)

>I see your ideal protection strategy as one that can be a set as a goal, but a goal that's (or will be) rarely reached in practice.

Oh I agree completely, but I hope that it helps to at least show which elements of the set up address the concerns of ransomware and malicious insiders, and readers can take away what they can reasonably accomplish and understand their weaknesses.
Post Reply

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 114 guests