9.5/ReFS/Server 2016 Memory Consumption

Post by **Gostev** » Jul 06, 2017 9:20 pm this post

Rustam, but the limit is PER BLOCK not per volume, right? If yes, then I am with @nmdange here - don't see how it can ever be reached, realistically (outside of extreme cases)...

rkovhaev · Post by **rkovhaev** » Jul 06, 2017 10:15 pm this post

Yes, it is per REFS block (file region)

If VIB file is relatively big (per-vm disabled) and has good inline dedup ratio (inline dedup enabled) then it should be possible to hit this REFS limit relatively quickly.
Datamover executes patch/merge command on per FIB (file in backup) level.
When inline dedup enabled multiple FIBs blocks inside VIB can reuse the same VIB storage blocks.
And because of that during merge of multiple FIBs from VIB to VBK we pass the same VIB source offset to DeviceIoControl() during merge of different FIBs - this is how we hit the limit

We can also run into this REFS limit during other synthetic operations, for example creation of synthetic full (compileFIB command is also executed on per FIB level) - in this case VBK must have good inline dedup ratio - so we will pass the same source offset to DeviceIoControl() during creation of new synthetic full.

doggatas · Post by **doggatas** » Jul 07, 2017 12:01 am this post

Is there an official statement from Veeam regarding this combination of technologies? This is quite worrying. Since moving to Veeam from Backup Exec ~5 years ago, I've never felt so comfortable with our backups. However, we've just redone our backup environment, which includes using these combinations of technologies. We've had our backups running successfully for the previous 6 months and I suspect this is because we're not at the scale of some of the posters in this thread.

We have the following forever forward Backup Jobs(31 retention):

Exchange in its own job (one server): VBK = 2TB, VIBs = ~70Gb
Fileserver in its own job: VBK = 2TB, VIBS = 40Gb
A single Job for all other servers that need a daily backup 12 servers. VBK = 1.4TB, VIBs = 150Gb - I could potentially split this job out and have our Domino servers in a separate job

We also have 2 backup copy jobs for each backup job above using the following settings:

Backup Copy with 2 retention points and 12 Monthly retentions to a repository in the same data centre as the main jobs
Backup Copy with 31 retention points and 12 Monthly retentions to a repository in a different data centre
Backup Copy to Tape - Full VBKs every day

We haven't had any issues(yet) with the jobs in regards to fastClone and merges but I have noticed that the Veeam repository servers freezing when deleting source VBKs after we've moved jobs to different repositories. Also noticed when deleting vbks (when it doesn't hang) it takes quite a while for the volume to show the updated volume usage stats. e.g. If the volume has 5TB free and we delete a 1TB VBK it will take ~3 minutes for that extra free space to show in the volume stats.

I suspect we may get this error when it comes times for Veeam to delete the 1st monthly VBK from our GFS jobs. Based on our retention of 12 for the monthly, this won't happen for another 6 months.

I should copy and paste this post into a support case so I can get an official statement from Veeam.

Do I copy everything to tape***, rebuild the servers to server 2012R2 and use NTFS? It feels like I should. I don't like sitting on time bombs. I want that same confidence that I had only less than 1 hour ago (as I said, haven't had any issues, yet. But reading this thread makes me worry).

***In fact, is there a way to copy all vbks and all vibs to a tape using Veeam?

rkovhaev · Post by **rkovhaev** » Jul 08, 2017 2:33 am this post

David, with your setup I don't think you will run into ERROR_BLOCK_TOO_MANY_REFERENCES issue.
You can always temporarily disable REFS fast-clone and let the job do merge with ReadFile() WriteFile() instead of DeviceIoControl(), and then you can re-enable REFS fast-clone.

doggatas · Post by **doggatas** » Jul 08, 2017 6:00 am this post

Great, and thanks for the reply.

Regards,
David

RGijsen · Post by **RGijsen** » Jul 28, 2017 6:25 pm this post

We have about the same environment as David.
1 job for Exchange (2 mailbox hosts and an edge server, about 1.6TB)
1 job for fileserver (2TB)
1 job for the other VM's, about 20 (about 1TB)

The fileserver job ran into the ERROR_BLOCK_TOO_MANY_REFERENCES issue today. This is really, REALLY worrying. Will Veeam fix this? I know the base is yet another MS issue, although 'turn off fast clone, let your job merge and re-enable it' is certainly not a workable solution.

EthanStark · Post by **EthanStark** » Jul 28, 2017 8:12 pm this post

To provide greater resiliency for its metadata, the Resilient File System (ReFS) in Windows Server 2016 uses allocate-on-write semantics for all metadata updates. This means that ReFS never makes in-place updates to metadata. Instead, it makes all writes to newly allocated regions.

However, allocating-on-write causes ReFS to issue more metadata I/O to new regions of the volume than write-in-place file systems do. Additionally, ReFS uses block caching logic to cache its metadata in RAM. This is not as resource-efficient as file caching logic.

Together, the ReFS block caching logic and allocate-on-write semantics cause ReFS metadata streams to be large. ReFS uses the cache manager to create the metadata streams, and the cache manager lazily unmaps inactive views. In some situations, this lazy unmapping causes the active working set on the server to grow. This creates memory pressure that can cause poor performance.

This issue is addressed in cumulative update 4013429 that was released on March 14, 2017. The update introduces three tunable registry parameters. (See the "Workaround" section.)

Cumulative update 4013429 is available through Windows Update. You can also download it directly through the Microsoft Update Catalog.

Ethan Stark

OmiFreak · Post by **OmiFreak** » Aug 08, 2017 3:35 pm this post

We run into the "ERROR_BLOCK_TOO_MANY_REFERENCES" with one of our file server backup copy jobs this week.
Anton Gostev itself "forced" us to use ReFS 3.1 together with Veeam, especially with big jobs because of the stability and the space savings with GFS.
No after migrating TBs of backup data to ReFS we are left in the lurch. Disable fast clone is not a solution for us, we do not have the space for that.
Veeam please come with a solution for that!

Bernd

Post by **tsightler** » Aug 08, 2017 3:50 pm this post

Please open a case and explain that disabling fast clone is not acceptable for your use case. Also, please post your case number for reference.

OmiFreak · Post by **OmiFreak** » Aug 09, 2017 6:18 am this post

My case number: 02273876

cip2013 · Post by **cip2013** » Aug 11, 2017 9:06 pm this post

Not to get too far off track from the original post, but we were also seeing high memory and high CPU utilization when running our backups to ReFS volumes. It was so bad that it would hang the server at the end of every job and prevent future jobs from running. The strange thing is that it would happen only after the job was complete. You couldn't even get access to the console through the iDRAC. It would require a hard reset after every job ran. After working with support for about a week and trying several different things, we decided to scrap the volume and format it with NTFS. Since then we have not had any issues with the backups (3 days). I was looking forward to being able to use ReFS, but I need my backups to be reliable. I think it will be a long time before we try ReFS again.

JVA@Alsic · Post by **JVA@Alsic** » Aug 16, 2017 2:55 pm this post

Any news on this issue?
We are experiencing the same kind of problems.

Memory consumption is extremely high when performing some tasks.

Code: Select all

Windows successfully diagnosed a low virtual memory condition.
The following programs consumed the most virtual memory: VeeamAgent.exe (6376) consumed 7836950528 bytes, VeeamAgent.exe (10612) consumed 7574495232 bytes, and VeeamAgent.exe (1744) consumed 1968599040 bytes.

Backup Copy Jobs fail while merging backup files.

Code: Select all

Failed to merge full backup file Error: Agent: Failed to process method {Transform.Patch}: A file system block being referenced has already reached the maximum reference count and can't be referenced any further.

The previously mentioned update Windows Server 2016 KB4013429 is already installed and does not help to solve the problem.
Case for the block referencing has been made (Case ID 02281995)

Post by **Gostev** » Aug 17, 2017 6:47 am this post

How much RAM do you have on the repository server, and how many concurrent jobs are you running? The above memory consumption by Veeam data movers does not really depend on the file system used.

Post by **Gostev** » Aug 21, 2017 12:59 pm this post

Just to update on one of the issues mentioned in this thread, the "TOO_MANY_REFERENCES" error is almost for sure caused by a bug on our side (fast cloning logic clashing with built-in deduplication). We will be trying a hot fix on the few affected customers to confirm - if you don't have a support case open on this specific issue, please do to get one. Thanks!

andy51585 · Post by **andy51585** » Sep 16, 2017 3:54 pm this post

We just ran in to this issue last night "A file system block being referenced has already reached the maximum reference count". Opened a case with support. Case# 023131118.

This appears to only be affecting one of the jobs writing to this specific repository at this time. Backup size is appx 6 TB. This is a newer backup job and were only at week 4 with 90 day retention. (Daily Incrementals and Weekly Synthetic Full)

Post by **tsightler** » Sep 16, 2017 7:20 pm this post

andy51585 wrote:We just ran in to this issue last night "A file system block being referenced has already reached the maximum reference count". Opened a case with support. Case# 023131118.

I believe support has a private fix for this available as long as you are on 9.5 U2 and they verify that this is the issue. I'm pretty sure this issue will be addressed in U3 once it is released.

TrevorBell · Post by **TrevorBell** » Nov 15, 2017 6:28 am this post

Interesting thread and I'm having the same issue myself support ticket raised, this only started happening since Win2016 server became unresponsive Saturday morning 11/11/17 and needed a reboot, since then only 2 jobs over 1TB each, backing up a single VM each are affected, Main production backup of 30 VM`s is totally fine.

I could reformat to 64k but seeing as the issue is creeping into some users with that block size , I have asked for the private fix to see if this fixes the issue and will report back.

Is anyone seeing the below error in Windows Event Viewer Logs \ system or any other ReFS errors ?

The file system detected a checksum error and was not able to correct it. The name of the file or folder is "Block Reference Count Table".

Thanks

Trev..

TrevorBell · Post by **TrevorBell** » Nov 15, 2017 7:04 am this post

Support within 40 minutes supplied the fix, now its applied and all is now working as expected.

Thanks

Trev

Delo123 · Post by **Delo123** » Nov 15, 2017 8:33 am this post

Better switch to 64K anyway, better be safe than sorry....

R&D Forums

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Re: 9.5/ReFS/Server 2016 Memory Consumption

Who is online