Comprehensive data protection for all workloads
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by Gostev »

Rustam, but the limit is PER BLOCK not per volume, right? If yes, then I am with @nmdange here - don't see how it can ever be reached, realistically (outside of extreme cases)...
rkovhaev
Veeam Software
Posts: 39
Liked: 21 times
Joined: May 17, 2010 6:49 pm
Full Name: Rustam
Location: hockey night in canada
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by rkovhaev »

Yes, it is per REFS block (file region)

If VIB file is relatively big (per-vm disabled) and has good inline dedup ratio (inline dedup enabled) then it should be possible to hit this REFS limit relatively quickly.
Datamover executes patch/merge command on per FIB (file in backup) level.
When inline dedup enabled multiple FIBs blocks inside VIB can reuse the same VIB storage blocks.
And because of that during merge of multiple FIBs from VIB to VBK we pass the same VIB source offset to DeviceIoControl() during merge of different FIBs - this is how we hit the limit

We can also run into this REFS limit during other synthetic operations, for example creation of synthetic full (compileFIB command is also executed on per FIB level) - in this case VBK must have good inline dedup ratio - so we will pass the same source offset to DeviceIoControl() during creation of new synthetic full.
doggatas
Enthusiast
Posts: 38
Liked: 2 times
Joined: Jul 24, 2012 1:15 am
Full Name: David O
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by doggatas »

Is there an official statement from Veeam regarding this combination of technologies? This is quite worrying. Since moving to Veeam from Backup Exec ~5 years ago, I've never felt so comfortable with our backups. However, we've just redone our backup environment, which includes using these combinations of technologies. We've had our backups running successfully for the previous 6 months and I suspect this is because we're not at the scale of some of the posters in this thread.

We have the following forever forward Backup Jobs(31 retention):
  • Exchange in its own job (one server): VBK = 2TB, VIBs = ~70Gb
  • Fileserver in its own job: VBK = 2TB, VIBS = 40Gb
  • A single Job for all other servers that need a daily backup 12 servers. VBK = 1.4TB, VIBs = 150Gb - I could potentially split this job out and have our Domino servers in a separate job
We also have 2 backup copy jobs for each backup job above using the following settings:
  • Backup Copy with 2 retention points and 12 Monthly retentions to a repository in the same data centre as the main jobs
  • Backup Copy with 31 retention points and 12 Monthly retentions to a repository in a different data centre
  • Backup Copy to Tape - Full VBKs every day
We haven't had any issues(yet) with the jobs in regards to fastClone and merges but I have noticed that the Veeam repository servers freezing when deleting source VBKs after we've moved jobs to different repositories. Also noticed when deleting vbks (when it doesn't hang) it takes quite a while for the volume to show the updated volume usage stats. e.g. If the volume has 5TB free and we delete a 1TB VBK it will take ~3 minutes for that extra free space to show in the volume stats.

I suspect we may get this error when it comes times for Veeam to delete the 1st monthly VBK from our GFS jobs. Based on our retention of 12 for the monthly, this won't happen for another 6 months.

I should copy and paste this post into a support case so I can get an official statement from Veeam.

Do I copy everything to tape***, rebuild the servers to server 2012R2 and use NTFS? It feels like I should. I don't like sitting on time bombs. I want that same confidence that I had only less than 1 hour ago (as I said, haven't had any issues, yet. But reading this thread makes me worry).

***In fact, is there a way to copy all vbks and all vibs to a tape using Veeam?
rkovhaev
Veeam Software
Posts: 39
Liked: 21 times
Joined: May 17, 2010 6:49 pm
Full Name: Rustam
Location: hockey night in canada
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by rkovhaev »

David, with your setup I don't think you will run into ERROR_BLOCK_TOO_MANY_REFERENCES issue.
You can always temporarily disable REFS fast-clone and let the job do merge with ReadFile() WriteFile() instead of DeviceIoControl(), and then you can re-enable REFS fast-clone.
doggatas
Enthusiast
Posts: 38
Liked: 2 times
Joined: Jul 24, 2012 1:15 am
Full Name: David O
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by doggatas »

Great, and thanks for the reply.

Regards,
David
RGijsen
Expert
Posts: 127
Liked: 29 times
Joined: Oct 10, 2014 2:06 pm
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by RGijsen »

We have about the same environment as David.
1 job for Exchange (2 mailbox hosts and an edge server, about 1.6TB)
1 job for fileserver (2TB)
1 job for the other VM's, about 20 (about 1TB)

The fileserver job ran into the ERROR_BLOCK_TOO_MANY_REFERENCES issue today. This is really, REALLY worrying. Will Veeam fix this? I know the base is yet another MS issue, although 'turn off fast clone, let your job merge and re-enable it' is certainly not a workable solution.
EthanStark
Lurker
Posts: 1
Liked: never
Joined: Jul 28, 2017 8:08 pm
Full Name: Ethan Stark
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by EthanStark »

To provide greater resiliency for its metadata, the Resilient File System (ReFS) in Windows Server 2016 uses allocate-on-write semantics for all metadata updates. This means that ReFS never makes in-place updates to metadata. Instead, it makes all writes to newly allocated regions.

However, allocating-on-write causes ReFS to issue more metadata I/O to new regions of the volume than write-in-place file systems do. Additionally, ReFS uses block caching logic to cache its metadata in RAM. This is not as resource-efficient as file caching logic.

Together, the ReFS block caching logic and allocate-on-write semantics cause ReFS metadata streams to be large. ReFS uses the cache manager to create the metadata streams, and the cache manager lazily unmaps inactive views. In some situations, this lazy unmapping causes the active working set on the server to grow. This creates memory pressure that can cause poor performance.

This issue is addressed in cumulative update 4013429 that was released on March 14, 2017. The update introduces three tunable registry parameters. (See the "Workaround" section.)

Cumulative update 4013429 is available through Windows Update. You can also download it directly through the Microsoft Update Catalog.

Ethan Stark
OmiFreak
Enthusiast
Posts: 28
Liked: 9 times
Joined: Apr 26, 2011 4:11 pm
Full Name: Bernd Flatz
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by OmiFreak »

We run into the "ERROR_BLOCK_TOO_MANY_REFERENCES" with one of our file server backup copy jobs this week.
Anton Gostev itself "forced" us to use ReFS 3.1 together with Veeam, especially with big jobs because of the stability and the space savings with GFS.
No after migrating TBs of backup data to ReFS we are left in the lurch. Disable fast clone is not a solution for us, we do not have the space for that.
Veeam please come with a solution for that!

Bernd
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by tsightler »

Please open a case and explain that disabling fast clone is not acceptable for your use case. Also, please post your case number for reference.
OmiFreak
Enthusiast
Posts: 28
Liked: 9 times
Joined: Apr 26, 2011 4:11 pm
Full Name: Bernd Flatz
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by OmiFreak »

My case number: 02273876
cip2013
Enthusiast
Posts: 29
Liked: 1 time
Joined: May 06, 2015 9:36 pm
Location: USA
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by cip2013 »

Not to get too far off track from the original post, but we were also seeing high memory and high CPU utilization when running our backups to ReFS volumes. It was so bad that it would hang the server at the end of every job and prevent future jobs from running. The strange thing is that it would happen only after the job was complete. You couldn't even get access to the console through the iDRAC. It would require a hard reset after every job ran. After working with support for about a week and trying several different things, we decided to scrap the volume and format it with NTFS. Since then we have not had any issues with the backups (3 days). I was looking forward to being able to use ReFS, but I need my backups to be reliable. I think it will be a long time before we try ReFS again.
JVA@Alsic
Novice
Posts: 5
Liked: never
Joined: Dec 29, 2014 10:00 am
Full Name: Jeroen Van Acker
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by JVA@Alsic »

Any news on this issue?
We are experiencing the same kind of problems.
  • Memory consumption is extremely high when performing some tasks.

    Code: Select all

    Windows successfully diagnosed a low virtual memory condition.
    The following programs consumed the most virtual memory: VeeamAgent.exe (6376) consumed 7836950528 bytes, VeeamAgent.exe (10612) consumed 7574495232 bytes, and VeeamAgent.exe (1744) consumed 1968599040 bytes.
  • Backup Copy Jobs fail while merging backup files.

    Code: Select all

    Failed to merge full backup file Error: Agent: Failed to process method {Transform.Patch}: A file system block being referenced has already reached the maximum reference count and can't be referenced any further. 
The previously mentioned update Windows Server 2016 KB4013429 is already installed and does not help to solve the problem.
Case for the block referencing has been made (Case ID 02281995)
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by Gostev »

How much RAM do you have on the repository server, and how many concurrent jobs are you running? The above memory consumption by Veeam data movers does not really depend on the file system used.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by Gostev »

Just to update on one of the issues mentioned in this thread, the "TOO_MANY_REFERENCES" error is almost for sure caused by a bug on our side (fast cloning logic clashing with built-in deduplication). We will be trying a hot fix on the few affected customers to confirm - if you don't have a support case open on this specific issue, please do to get one. Thanks!
andy51585
Novice
Posts: 8
Liked: never
Joined: Jul 28, 2014 9:13 pm
Full Name: Andrew
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by andy51585 »

We just ran in to this issue last night "A file system block being referenced has already reached the maximum reference count". Opened a case with support. Case# 023131118.

This appears to only be affecting one of the jobs writing to this specific repository at this time. Backup size is appx 6 TB. This is a newer backup job and were only at week 4 with 90 day retention. (Daily Incrementals and Weekly Synthetic Full)
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by tsightler »

andy51585 wrote:We just ran in to this issue last night "A file system block being referenced has already reached the maximum reference count". Opened a case with support. Case# 023131118.
I believe support has a private fix for this available as long as you are on 9.5 U2 and they verify that this is the issue. I'm pretty sure this issue will be addressed in U3 once it is released.
TrevorBell
Veteran
Posts: 357
Liked: 17 times
Joined: Feb 13, 2009 10:13 am
Full Name: Trevor Bell
Location: Worcester UK
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by TrevorBell »

Interesting thread and I'm having the same issue myself support ticket raised, this only started happening since Win2016 server became unresponsive Saturday morning 11/11/17 and needed a reboot, since then only 2 jobs over 1TB each, backing up a single VM each are affected, Main production backup of 30 VM`s is totally fine.

I could reformat to 64k but seeing as the issue is creeping into some users with that block size , I have asked for the private fix to see if this fixes the issue and will report back.

Is anyone seeing the below error in Windows Event Viewer Logs \ system or any other ReFS errors ?

The file system detected a checksum error and was not able to correct it. The name of the file or folder is "Block Reference Count Table".

Thanks

Trev..
TrevorBell
Veteran
Posts: 357
Liked: 17 times
Joined: Feb 13, 2009 10:13 am
Full Name: Trevor Bell
Location: Worcester UK
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by TrevorBell »

Support within 40 minutes supplied the fix, now its applied and all is now working as expected.

Thanks

Trev
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: 9.5/ReFS/Server 2016 Memory Consumption

Post by Delo123 »

Better switch to 64K anyway, better be safe than sorry....
Post Reply

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 51 guests