Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Novox · Post by **Novox** » Dec 13, 2023 2:52 pm this post

I'm using ReFS 64k in a DAS as a Repository for Veeam on a Windows Server Virtual Machine for "Fast Clone" functionality.

Most of the time everything works fine. However, once in a while, the space used on my ReFS Repo volume will suddenly jump very high.

No backup jobs are occurring when this happens and the server is only running as a VBR backup server. If I browse ReFS logs from Event Viewer via "Applications and Services Logs" -> "Microsoft" -> "Windows" -> "ReFS" -> "Operational" I see some interesting events.

I'll receive an ReFS Informational message Event ID 142, at 10:53am for instance, containing the following:

Code: Select all

Summary of disk space usage, since last event:

           Available clusters: 5389072 (328.93 GB)
           Reserved clusters: 33876 (2.07 GB)
           Metadata clusters: 30836 (1.89 GB)
           Used clusters: 2801904 (171.02 GB)
           
           Volume size: 499.94 GB
           Bytes per cluster: 65536
           
           Volume correlation ID: {a1804b5a-c0c4-4803-bd78-782caa2c06cd}
           Volume name: V:
           Device name: \Device\HarddiskVolume6
           Space ID: {00000000-0000-0000-0000-000000000000}
           
           Elapsed seconds: 3644

Followed by a few "IO Latency Summary" Informationals (Event ID 170), that mention the following "IO types":

IO type: Close

IO type: Cleanup

IO type: Open volume for DASD

IO type: Open file

Followed by another ReFS "Summary of disk space usage" (Event ID 142) at 11:54am with the following:

Code: Select all

Summary of disk space usage, since last event:

           Available clusters: 182325 (11.13 GB)
           Reserved clusters: 33884 (2.07 GB)
           Metadata clusters: 30479 (1.87 GB)
           Used clusters: 8008651 (488.81 GB)
           
           Volume size: 499.94 GB
           Bytes per cluster: 65536
           
           Volume correlation ID: {a1804b5a-c0c4-4803-bd78-782caa2c06cd}
           Volume name: V:
           Device name: \Device\HarddiskVolume6
           Space ID: {00000000-0000-0000-0000-000000000000}
           
           Elapsed seconds: 3644

The "Used Clusters" (space) has jumped from 171.02 GB to 488.81 GB within about an hour, again, with no VBR jobs running and no user-initiated file operations (that I know of) on the V: drive.

Perhaps notably, in the "System" Log, at 11:10am:

Code: Select all

The Optimize drives service entered the running state.

And in the "Application" Log, at 11:41am, I see the following "Defrag" Informationals Event ID 258:

Code: Select all

The storage optimizer successfully completed defragmentation on Veeam (V:)

Code: Select all

The storage optimizer successfully completed slab consolidation on Veeam (V:)

Has anyone else experienced this? What might be happening here? The server is running as a VMWare VM within ESXi 8. The Physical Media Type for the V: drive is SSD, and the "Optimize Drives" app indicates that the Logical Media Type is "Thin provisioned drive." Could ReFS defragmentation and/or slab consolidation be causing the space utilized to jump so dramatically? The only thing I can think is that the Drive Optimizer is doing something to expand the ReFS fast cloned blocks out into individually referenced blocks?

Post by **Mildur** » Dec 13, 2023 3:03 pm this post

Hi Novox

I'm not certain about the root cause. Unfortunately, we can't delve into specific environmental issues through a forum post as we don't have access to your environment. Please consider opening a case with our customer support team for a thorough log analysis. If there's a backup session or Veeam process linked to this space hikes, the logs should provide insights.

Best,
Fabian

Novox · Post by **Novox** » Dec 13, 2023 3:07 pm this post

I found other forum posts, such as this, veeam-backup-replication-f2/refs-to-def ... 43401.html, that contain such responses as:

I've stated it in other threads but I'll repeat it.
With defragmentation you will lose block clone space savings. Therefor only defragment if you have one long chain per vm/job. Synthetic fulls will get rehydrated.
Also you'll have Veeam job failures during analysis phase as defrag will lock filesystem metadata (per folder it seems). On large file systems, analysis can take days.
You can do per-file defrag with Sysinternals contig.exe (shorter metadata lock times) but I'm not sure it's worth the effort.

(from @DomZoomik)

Defragmentation loses block clone space savings, and Synthetic fulls will get rehydrated?!

Novox · Post by **Novox** » Dec 13, 2023 3:18 pm this post

I would still love human input, but I may have AI'd my way out of this?

Me:

I'm working with Veeam, a system for backing up files. My ReFS volume suddently increases in size when a defragmentation job is run. In a forum post, someone mentioned, "With defragmentation you will lose block clone space savings. Therefor only defragment if you have one long chain per vm/job. Synthetic fulls will get rehydrated." Can you please explain this to me?

A popular AI...

Certainly, Bill. It appears you're encountering an issue with your ReFS volume in conjunction with Veeam's defragmentation process. Let's break down the information from the forum post you mentioned:

Block Clone Space Savings: ReFS (Resilient File System) in Windows Server has a feature called block cloning. This feature allows multiple files or even different parts of the same file to share the same physical data blocks on disk, thereby saving storage space. When you defragment an ReFS volume, it rearranges and optimizes the data on the disk. During this process, the file system might lose the benefits of block cloning because the data blocks are no longer contiguous or shared among multiple files.

One Long Chain per VM/Job: The forum post suggests that you should only defragment an ReFS volume if you have one long data chain per virtual machine (VM) or backup job. A "long chain" in this context means that the data blocks for a specific VM or job are stored contiguously on the disk. If you have multiple shorter chains, defragmenting may not be as beneficial because it can potentially disrupt the block cloning space savings.

Synthetic Fulls Rehydration: In Veeam, synthetic full backups are created by consolidating the incremental backups into a single full backup file. The term "rehydration" means that the data that was previously block-cloned or deduplicated might need to be expanded or "rehydrated" during this consolidation process. This could result in an increase in storage space usage.

So, in summary, the advice from the forum post suggests that defragmenting an ReFS volume should be done with caution, especially if you have multiple short data chains or if you rely heavily on block cloning and deduplication for storage efficiency. It's important to weigh the benefits of defragmentation against the potential loss of space savings and rehydration of synthetic full backups.

Please note that the information provided here is based on the details you provided, and it's always a good idea to consult with your IT team or Veeam support for specific guidance tailored to your setup.

Me:

Can you please elaborate on, "During this process, the file system might lose the benefits of block cloning because the data blocks are no longer contiguous or shared among multiple files."

Same AI:

Certainly, Bill. Let's delve deeper into how the ReFS file system and block cloning work and why defragmentation might impact this process:

Block Cloning in ReFS:

ReFS is a modern file system developed by Microsoft, and it has several advanced features designed to improve data integrity and storage efficiency.
One of these features is block cloning, which allows multiple files or parts of the same file to share the same physical data blocks on the disk.
When you create or modify a file, ReFS identifies identical blocks of data and only stores them once on disk, using references to those blocks for other files. This results in storage space savings because duplicate data doesn't need to be stored redundantly.
Impact of Defragmentation:

Defragmentation is a process that optimizes the layout of data on a disk by rearranging fragmented files and placing related data blocks closer together.
During the defragmentation process, data blocks that were previously scattered or fragmented across the disk are moved to create contiguous regions of free space. This can involve reshuffling the location of data blocks to make files more organized and sequential.
Loss of Block Cloning Benefits:

When ReFS data blocks are rearranged during defragmentation, the original mapping of which blocks belong to which files may be disrupted.
As a result, the file system may no longer recognize the opportunity for block cloning. The blocks that were once shared among multiple files might become separated, and duplicated data might be reintroduced.
Imagine a scenario where two files initially shared the same data blocks because they contained identical information. After defragmentation, these files may no longer share those blocks, and the storage space efficiency achieved through block cloning is reduced. Instead, the data might be stored redundantly for each file, consuming more space.

In summary, defragmentation, while beneficial for optimizing disk performance and data access, can have unintended consequences in file systems like ReFS that rely on block cloning for space savings. It can disrupt the organization of data blocks and potentially lead to increased storage space usage due to the loss of the benefits of block cloning. Therefore, it's crucial to carefully consider whether and when to defragment ReFS volumes, especially in environments where storage efficiency is a primary concern.

As the post suggests, I will disable defragmentation of my ReFS volume and see what happens.

Novox · Post by **Novox** » Dec 13, 2023 3:26 pm this post

...Not to mention that defragmenting an SSD drive is useless (right?) Windows Server doesn't seem to be able to detect the physical media type in a VM guest/abstration scenario...

Novox · Post by **Novox** » Dec 13, 2023 5:38 pm this post

Also, from Anton (@gostev), "You should treat ReFS volume where block cloning was used as a sort of deduplication storage. You can never have it NOT "heavily fragmented" due to the nature of the process (different files sharing the same physical block). As such, there's absolutely no point to perform defragmentation - just like there's no point in defragmenting a deduplicating storage. In fact, the only way to actually defragment any given file would require "inflating" one by recreating the file without leveraging block cloning." (veeam-backup-replication-f2/file-mainte ... 41602.html)

So... seems like the Windows Server defragmentation enabled for ReFS is definitely the culprit.

Post by **DonZoomik** » Dec 13, 2023 10:06 pm this post

You're right, defrag should be disabled on block cloning ReFS volumes. Defrag on SSD is not totally pointless but effects are minor and on virtualized ReFS even counterproductive (no TRIM/UNMAP support, so you’ll expand underlying disk to full size).

To add to Gostev's point, defrag engine only looks at data continuity from file's perspective, it would be really hard to even make out an "average" continuous block placement over shared VBKs as most probably physical blocks don't have backlinks to files referencing them, requiring all logical file blocks to be mapped to physical blocks during scan. This is not trivial at all (discussed in service provider forums). Also, about fragmentation, VIB merges make fragmentation worse and worse over time. You start with nice continuous VBK but suddenly some blocks are mapped to parts of VIB located somewhere else on a disk. Repeat it daily over several years and depending on your change rate, it could get pretty ugly and you'll end with read performance of dedupe appliances. If you're not aware, it can be pretty awful without specialized optimizations (reading data in on-disk order, not in file logical order etc...).

Novox · Post by **Novox** » Dec 14, 2023 1:08 pm this post

Thank you @DonZoomik!

R&D Forums

Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Re: Odd Issue with ReFS Randomly/Suddenly Increasing Utilization

Who is online