Comprehensive data protection for all workloads
Post Reply
christiankelly
Service Provider
Posts: 128
Liked: 11 times
Joined: May 06, 2012 6:22 pm
Full Name: Christian Kelly
Contact:

RAM requirements and stability of Instant Restore in v10?

Post by christiankelly »

Has anyone here done any large production "instant restores" with v10? I know a lot of changes were made to "speed things up using memory" and we've had a failure on a large server which has caused a lot of downtime for one of our clients. I'll preface this information that emergency support has been AMAZING!! I've had at least 20 calls with them over the last 48 hours and they've been super responsive in trying to get things working. At this point, we're just trying to get the data/server restored & we haven't looked at the underlying issues yet, but I have a hunch that's it's v10 related so I wanted to see if others have had success.

We've updated our "fleet" of Veeam servers to v10 in the last week so this is the first time we're doing an instant restore, and I'll say on v9.x we've done 50+ in the last year and they always worked flawlessly, this is the first time we've had any issues/failures.

It started on Friday at 3:00 am where an ESXi host failed with 10,000+ms latency to the array so all the VMs were non-responsive. Thankfully we have Veeam so we started recovering VMs to the alternate host and scrambled to get a new host in place to hold the largest server which was a 2.5TB files server. Once the new server was in place I did an instant restore with the plan to sVmotion the data Friday night.

The first issue started right when we booted the VM after about 10 min the VM went nonresponsive and it looked to be an underlying issue with the NFS storage mount so I killed the IR and kicked off another one and things seemed to be ok so the client was fully operational by 9:00 am. Yeah, Veeam!! One thing I did notice throughout Friday was that the memory usage on the Veeam server was very high. I'll say here our Veeam servers/repositories are not large from a memory / CPU standpoint but have never been an issue before. This sever has 4core / 8GB of RAM / 8 TB of NTFS local storage.

After hours on Friday, I kicked off an sVmotion and when I checked in around 5:00am Saturday is when things really started to go sideways. The storage vMotion had failed and the IR jobs were in a "mount failed" state. The VM was frozen with no access to the datastore holding the base disks. I called VM support and started down a long path of trying to get the VMs restored which also failed many times due to a number of odd errors that couldn't be directly explained, possibly more memory issues as the IR job was still running but not active.

At this point, we've been trying to restore the 2.5TB of data and then the plan is to connect the snapshot files and boot it up so the client doesn't lose Friday's changes. And the server has been down for about 32 hours. Thankfully it's the weekend but we would normally have been able to migrate with no downtime.

Anyway, it's possible that this is some kind of unique local issue and I'm sure we will get to the bottom of it with support but I wanted to put feelers out to see if anyone has done larger production instant restores without issues with v10? I'm getting nervous about our 80+ Veeam 10 servers' ability to do IR in the short term till I know what the underlying issue is.

Thanks,
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Stability of Instant Restore in v10?

Post by PetrM »

Hi Christian,

It seems to be related to the environment specific at first sight, I would say that this is really some kind of unique local issue.

It would make sense to run a couple of test restores to see if the same issue reoccurs and to ask our support team for RCA.
Usually, awareness and understanding of the root cause can allow appropriate preventive actions to be taken when it's required.

Thanks!
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Stability of Instant Restore in v10?

Post by Gostev »

christiankelly wrote: May 10, 2020 9:57 pmOne thing I did notice throughout Friday was that the memory usage on the Veeam server was very high. I'll say here our Veeam servers/repositories are not large from a memory / CPU standpoint but have never been an issue before. This sever has 4core / 8GB of RAM / 8 TB of NTFS local storage.
That's your issue right there Christian: lack of RAM.

Assuming you have all-in-one backup server, you're waaaaay below our system requirements (for any Veeam version) and this impacts the v10 instant recovery engine in particular. Keep in mind our next-gen IR engine puts all the RAM it expects to be available (according to the System Requirements) into a good use to accelerate IR VM I/O performance. Whereas legacy IR engine used a very tiny RAM cache.

Note: v10 does NOT change system requirements comparing to v9, as we tuned our new IR engine to fit into the existing system requirements.

Thanks!
christiankelly
Service Provider
Posts: 128
Liked: 11 times
Joined: May 06, 2012 6:22 pm
Full Name: Christian Kelly
Contact:

Re: Stability of Instant Restore in v10?

Post by christiankelly » 1 person likes this post

Hi Gostev,

That's my fear about these new v10 features. Is there any way to move IR to legacy mode? These backup servers are small local appliances that hold at few days of backups before sending them to the cloud.

We have always used these servers for IR with no issues, I would hope that v10 would just disable the RAM usage for servers with smaller amounts and run slower rather than destabilizing the server?

Anyway, I guess we'll see once we start the RCA on this server.

Thanks,
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Stability of Instant Restore in v10?

Post by Gostev » 1 person likes this post

Well, you cannot really revert fully, but you can set IRReadCachePerDiskMB (DWORD) value to 32 to reduce RAM consumption during Instant VM Recovery somewhere down to the pre-v10 level.

Two important things to note though:

1. We never tested the new IR engine performance and stability with such a small RAM cache. We played a lot with bigger numbers only. Purely in theory, from what I know, tiny cache size should not cause any stability issues.

2. The much bigger issue you should be worrying about is that your backup servers in general use an unsupported hardware configuration (not meeting minimum system requirements). But lack of RAM may affect too many things, one day you can potentially be refused support simply based on these grounds, regardless of what Veeam version you use.
christiankelly wrote: May 11, 2020 4:11 amI would hope that v10 would just disable the RAM usage for servers with smaller amounts and run slower rather than destabilizing the server?
What you're essentially suggesting here is that Veeam R&D should be spending cycles to implement and test some special logic to make our product work differently on unsupported backup servers which do not meet minimum system requirements. IMHO, it is a very strange expectation to have from any software vendor at all. Obviously, I may be biased :D
christiankelly
Service Provider
Posts: 128
Liked: 11 times
Joined: May 06, 2012 6:22 pm
Full Name: Christian Kelly
Contact:

Re: Stability of Instant Restore in v10?

Post by christiankelly » 1 person likes this post

Got it and agree time shouldn't be wasted on that. I wasn't aware we were running in an unsupported mode. I looked at the requirements page and I don't see an "all in one" requirement so I have put together what looks to be the minimum for a small appliance running no more than 1 job concurrently.

Does this look about right for minimum requirements?
Backup Server: 4.5GB (Only 1 job needed)
Console: 2GB
Proxy: 2.2GB (Only 1 job needed)
Repository: 4GB

So it looks like 16GB of RAM would be the minimum RAM needed for a small "all in one" server appliance which is protecting a small number of hosts/servers.

I know 32GB of RAM would be better but we have a large number of small servers out there and getting them to 16 is going to be much easier than 32 but I want to make sure I'm reading it right and this would be supported going forward.
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Stability of Instant Restore in v10?

Post by Gostev » 1 person likes this post

Actually, the repository needs 8 GB RAM:
"4 GB RAM, plus up to 2 GB RAM (32-bit OS) or up to 4 GB RAM (64-bit OS) for each concurrently processed machine"

However, 16GB RAM should be just right for your case, because RAM for OS servicing can be subtracted from each role for all-in-one installations.
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

[MERGED] next gen SureBackup maybe causing memory issues

Post by mcz »

Hi everyone,

I think everyone loves the new generation of SureBackup where a new cache is delivering much higher IO throughput than ever before. Now I've run multiple times into memory issues during SureBackup jobs (or at least I suppose that these are memory issues) when the backup-server finally wouldn't really respond. You can ping the vm but you cannot RDP or whatever, your session is just stuck and the vm is half alive and half dead (also, vmwaretools could run or not, I just had everything). Now that last time it happened was just 10 minutes ago any my vsphere metrics just showed a big increase in active memory, high CPU and disk usage (which would point to the pagefile) and finally (after I was able to connect back to my vm) my windows-user-session was dropped, and the SureBackup job failed do to an unexpected error.

Now the thing is that it's very hard to calculate or predict the ram usage of such a IR task - depending on vm activity it could be from very low to very high.
  • Can I have a look at the used memory of the cache per vm?
  • Are there any registry-keys to limit the amount of memory usage or something similar?
  • What is the default setup of such a IR task? Would it use as much memory as needed or is there a pre-built limit where veeam wouldn't extend the cache size?
Thanks for letting me know!
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by Gostev »

Hi, Michael. The recommendation is to ensure the mount server RAM meets our System Requirements for backup repositories (4GB for each concurrently processed machine). All other approaches is basically wandering around the "unsupported" territory. Thanks!
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by mcz »

Hi Anton,

thanks for the merge and the quick response. The thing is that I was running on 16 GB RAM, now increased to 25 GB! So way more than the minimum requirements. The problem is when the vm "dies", you can't really check the metrics within the os. Any idea how I could monitor that via the hypervisor? Because at the moment it's not more than an assumption. What I also should mention is the fact that it always causes these troubles when there's a parallel replication job taking place...
PetrM
Veeam Software
Posts: 3264
Liked: 528 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by PetrM » 1 person likes this post

Hi Michael,

What about vCenter performance charts or Veeam ONE memory performance charts?
Maybe you could try to re-schedule jobs in order to avoid simultaneous running of replication and SureBackup jobs and to check how it works?

Since we're talking about technical issue, I'd recommend to raise a support request and to share the support case ID with us.

Thanks!
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by Gostev » 1 person likes this post

mcz wrote: Jul 14, 2020 2:47 pmThe thing is that I was running on 16 GB RAM, now increased to 25 GB! So way more than the minimum requirements.
You noted you're using SureBackup job. This typically runs multiple VMs concurrently, so you need to make sure you have enough RAM depending on the concurrency. Also, make sure you're adding RAM on the correct server (mount server).

Although you also noted in your first post that "dying" is your backup server, which tells me you possibly have multiple Veeam components running on your backup server. This only multiplies the issue because all of them require RAM, and 16/25GB just does not seem adequate unless we're talking about a very small environment, such as ones OP supports. And something tells me it's not your case :D
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by mcz »

Ok, thanks for the helpful input so far. It's very interesting that the backup server went so well for quite a long time because we run it with 10 GB not so long ago. When I check the memory usage in veeam one, it looks like it's on the limit during those specific times when multiple jobs are running at the same time, so very surprising on the on hand.

I guess it's quite hard to do a manual calculation about how much RAM which component and process needs so what do you suggest? What would be a good approach to find out how much memory is needed?
foggy
Veeam Software
Posts: 21073
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by foggy » 1 person likes this post

Hi Michael, just sum up all the RAM requirements for the components running on a single server. That would be a fair estimation of how much RAM is needed.
nitramd
Veteran
Posts: 297
Liked: 85 times
Joined: Feb 16, 2017 8:05 pm
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by nitramd »

Foggy is correct, just sum the RAM requirements. I'd suggest that after summing the RAM you add 10% to 30% more RAM, just in case. Don't forget to periodically monitor the RAM usage.

After you've performed this task a number of times you'll be pretty good at it. Eventually, you use the SWAG method for estimation.

Good luck.
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by mcz » 1 person likes this post

Thanks guys, this is exactly what I needed! Now when I take a look at the documentation, I understand why Anton named it as "not adequate". Actually I ask myself how I ever went that well with only 10 GB... Thanks!
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by Gostev » 1 person likes this post

Well, our system requirements are very "generous" indeed. For example:

1. Each one assumes standalone component installation, and includes 2GB for OS servicing. This can be subtracted from all additional components sharing the same server.

2. For repositories, requirements are most generous, because there are tons of dependencies like machine size, number of disks and backup chain length. So while we require 4GB for each concurrently processed machine, most installations will be good with half as much RAM - even considering RAM caching of v10 instant recovery engine (2GB RAM is actually enough to do an IR of a machine with 4 disks).

However, they are generous for a reason. Over many years, lack of RAM remains within top 10 of all support issues we have to deal with. Even when people do assign sufficient RAM originally, they later make changes to concurrency or retention policies, create bigger machines to backup, or even simply install other software on the backup server, completely forgetting about adding extra RAM. Because of that, we want everyone to have that "buffer" in their backup infrastructure servers.

In fact, v10 noticeably reduced actual RAM requirements for most components (except for mount server, due to the new IR engine). Nevertheless, we kept system requirements unchanged - because again, sooner or later most customers do something that makes them run out of RAM, so the more headroom they have originally - the longer they will go without issues! And RAM is too cheap these days anyway.
mcz
Veeam Legend
Posts: 851
Liked: 180 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: RAM requirements and stability of Instant Restore in v10?

Post by mcz »

Very good explanation Anton and I totally agree! Thanks for letting us know.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Semrush [Bot] and 114 guests