REFS issues (server lockups, high CPU, high RAM)

adruet · Post by **adruet** » Apr 05, 2017 8:10 am this post

First, thank you very much @mkretzer for your detailed answers.

mkretzer wrote: We currently use a Fujitsu DX60 S3 Fibre Storage, what do you mean with cache ratio? There is no fixed cache percentage with this as with HP controllers.

In our HP RAID Controller card we have a cache ratio setting, default is 25% for reads and 75% for writes.

We have tried different settings and as transforms (merges) we our main issue, we set it to 50/50.

Here is a test I made to decide where is my next road to go regarding file system (the purpose was not to compare ReFS 4K with ReFS 64K, but ReFS with NTFS in terms of backup window):
- Run the same backup job, on different file systems: ReFS 4K, ReFS 64K, NTFS 32K
- All was ran on the very same hardware (repository configured as CiFS to use fast cloning feature):
- HP DL380 Gen9 with dual CPU Intel E5-2660 v4 2Ghz, 64 GB of RAM, raid 1 SSD for the OS
- Dual 10 Gbit network cards (HP 560FLR) supporting the offloading of SMB v3 (RDMA capabale)
- 2 x DAS HP D3700 with 25 x 1.8TB 12G 10K SAS disks configured as Raid 6 with HP p441 controller

And here are the results (be careful, no missreading, the ReFS 4K was not ran in the exact same conditions, as it took more than 30 minutes, other jobs were started at the same time):
- The job is a single VM, VBK is 420 GB, average VIB size is 20 GB, Forward Incremental with 30 recovery points
- ReFS 4K: 6.5 GB transfered, processing rate 63 MB/s, duration 4h40:20, processing 8min41, Full backup file merge 4h31:01 (as much as 10 jobs running at the same time)
- ReFS 64K: 19.4 GB transfered, processing rate 69 MB/s, duration 28:32, processing 15min08, Full backup file merge 12min51 (only one job at a time, which explains the difference of speed more than the 64K block size)
- NTFS 32K: 40.7 GB transfered, processing rate 62 MB/s, duration 27:32, processing 15min08, Full backup file merge 1min52 (as much as 10 jobs running at the same time) (NTFS must have a mega fast cloning

)

So ReFS can be quick, but it really should not be used with too many jobs running in parallel on the same repository. With my hardware, even with ReFS 64K, with 6 jobs running at the same time doing a merge, the file system starts to have a hard time responding even to a simple browse action in windows Explorer.

I had put a lot of hope in ReFS, even worked in TP4 and TP5 on building a on a nano cluster for storage spaces direct (then I found out about the licensing -> datacenter edition only).
But I will be migrating my storage back to NTFS.
The "benefits" of ReFS do not compete with our basic needs in being able to backup all we need to backup every night.

Post by **mkretzer** » Apr 05, 2017 8:33 am this post

About the Data Integrity Scan: We just ran it manually and it finished in a few seconds. Does this only do something after a crash?

kubimike · Post by **kubimike** » Apr 05, 2017 11:02 am this post

@mkretzer Graham8 thinks it causes an issue when block cloning is occurring.

ivordillen · Post by **ivordillen** » Apr 05, 2017 2:08 pm this post

Hello All,

Maybe I'm overlooking something but Option 2
RefsNumberOfChunksToTrim * 128MB (for volume of size > 10TB)
RefsNumberOfChunksToTrim * 64MB (for volume of size < 10TB)

Note Set the trim value to an appropriate number: 8, 16, 32, and so on.

What should I pick?

I have 2 Refs disks (64K) one of 15K en one of 50K. I have 96GB memory

thx

Ivor

graham8 · Post by **graham8** » Apr 05, 2017 2:12 pm this post

I've got a 32TB/64 raw volume, and I chose 128 ... so I entered "128" as a decimal value. I think the hex came out to be 80.

kubimike · Post by **kubimike** » Apr 05, 2017 3:44 pm this post

So if folks here are setting it to 128 that means its unmapping 1.6TBs of RAM?
Seems like that would cause an issue deallocating memory thats not even available.
4 * 128 = 1 GIG

Post by **tsightler** » Apr 05, 2017 6:18 pm this post

kubimike wrote:So if folks here are setting it to 128 that means its unmapping 1.6TBs of RAM?
Seems like that would cause an issue deallocating memory thats not even available.
4 * 128 = 1 GIG

I'm not so sure about the math there!

4 * 128MB = 512MB (not 1GIG)

128 * 128MB = 16,384MB = 16GB

kubimike · Post by **kubimike** » Apr 05, 2017 6:28 pm this post

whoops, not sure what I was thinking.

graham8 · Post by **graham8** » Apr 05, 2017 8:04 pm this post

Ahhh...likewise...I clearly shouldn't have set it with 128.

Since I only have 16GB of memory on this copy repository, I'm changing that from 128 for me to 32...which should mean that it uses 4GB of memory on my volume, since it's over 10TB... (32 * 128 = 4096).

The Microsoft person that was watching me do that affirmed that 128 was a good value for that server.... *sigh*.

kubimike · Apr 05, 2017 8:38 pm

The article is poorly written and I had to read it 5 times to get what they were talking about and how the calculations work. He probably doesn't get it either. It seems from what I understand option 1 really is the frequency of how often it dumps over default/stock

More : these options seem low level so did you're not careful you could dump more memory then what's available which I imagine could cause crashing itself.

Apr 05, 2017 9:14 pm

I agree that the article is poorly written. When I read the article it's hard for me to see how option 1 could do much of anything or why option 2 should be increased over the default when it reads as if it should be decreased. The only option that seems like it would really do something for the Veeam case is option 3. Unfortunately I haven't had access to the hardware I really need to test all these things out.

graham8 · Apr 06, 2017 12:18 pm

tsightler wrote:or why option 2 should be increased over the default when it reads as if it should be decreased

Err...I thought that it should be set to a lower value to decrease the amount of memory consumed? A higher value would mean it's trying to load more metadata into active memory simultaneously, wouldn't it?

kubimike · Post by **kubimike** » Apr 06, 2017 1:42 pm this post

Now Im confused again. I read it the other way around Option 1 = How frequently option 2 = how much to release

"Option 2: Works if the VA range that's being unmapped does not have any active references"

Post by **tsightler** » Apr 06, 2017 2:35 pm this post

graham8 wrote:Err...I thought that it should be set to a lower value to decrease the amount of memory consumed? A higher value would mean it's trying to load more metadata into active memory simultaneously, wouldn't it?

The article says the default is 4, and all of the examples in the article are larger numbers than that, and most people in this thread seem to be trying larger numbers (like 32). I haven't seen anybody report trying, for example, 1.

The reason I say it seems like it reads like it should be set smaller is because it specifically says in controls the granularity, how big of a chunk it will try to free at one time, however, it then specifically says the chunk has to have no active references to be free. Increasing the size would try to free more memory at once, but would also make it more likely that any given chunk couldn't be freed since a bigger chunk is more likely to have at least one active reference. In other words, if I'm freeing 1GB chunks vs 128MB chunks, I'd have to find an entire 1GB chunk that has no reference. Perhaps in practice this is OK because of the allocate-on-write semantics, but it just seems counter intuitive and I wish the article was more clear on exactly what changing these keys is expected to do in practice.

Post by **tsightler** » Apr 06, 2017 2:42 pm this post

One question I'd like to ask everyone on this thread, do you all have concurrency limits on the repository? In several of the cases that I have looked at customers were not limiting concurrency on the repository and had hundreds of VMs in the job. During backups the task were naturally limited by the number of proxies, however, once the merge started, Veeam would happily overwhelm the repository with dozens (maybe hundreds) of concurrent merge requests. Simply limiting concurrency on the repository to a reasonable number was all that was required to resolve the problem.

graham8 · Post by **graham8** » Apr 06, 2017 3:12 pm this post

tsightler wrote:Perhaps in practice this is OK because of the allocate-on-write semantics, but it just seems counter intuitive and I wish the article was more clear on exactly what changing these keys is expected to do in practice.

Agreed... I followed up with Microsoft on my case ID to ask them this specific question... whether higher or lower numbers results in less memory being consumed. I'll update everyone when I get an answer.

tsightler wrote:One question I'd like to ask everyone on this thread, do you all have concurrency limits on the repository?

I noticed that this was an option (concurrency on repositories) just yesterday, actually... previously I thought the only concurrency option was the GeneralOptions->I/OControl->EnableParallelProcessing checkbox. My copy repo is set to 4, which I guess is the default. I've got two copy jobs for different retention settings, and a total of 5 VMs within those 2 copy jobs.

With a setting of only 4 for concurrency, and only 5 VMs within 2 jobs, I wouldn't think too much parallelism would be the cause of a total system lock. I thought about dialing the number down to 1, but I don't want to swap too many variables around at this point...at least, not while I'm waiting on Microsoft to get back to me.

kubimike · Post by **kubimike** » Apr 06, 2017 3:24 pm this post

im set to 4 max concurrent tasks

Post by **tsightler** » Apr 06, 2017 4:01 pm this post

Yeah, I wouldn't think concurrency numbers that low would make any difference unless these are really small repos. Admittedly, most of the repos I work with contain 100's-1000's of VMs and are 100's of TBs and have 64GB or more of RAM, so the behavior may be different in those cases.

kubimike · Post by **kubimike** » Apr 06, 2017 4:17 pm this post

I have 45 vm's. took no chances and ordered 192 gigs of memory for my machine.

graham8 · Post by **graham8** » Apr 06, 2017 5:58 pm this post

Interesting...

I have two servers with identical disk layouts. One has 16GB ram and one has 32GB ram (though, plenty of free memory on both).

On the 16GB server, I have Option1 and Option3 enabled, and Option2 set to Option2 (RefsNumberOfChunksToTrim) set to 32. On the 32GB server, I have none of those options set.

I'm running a disk defrag on both and watching Rammap and tracking disk IO. On the server with all those options set, my Metafile (ie, ReFS) memory usage is between 500-1000MB, with only 20MB active. Disk IO is going very slow at ~10-15MB/s average.

On the other server, without those options set, Metafile usage is ~20GB, with ~4-5GB active, and disk IO is averaging ~300-400MB/s.

Sooo...yeah, having all three of those options in place is definitely aggressively limiting both ReFS driver memory usage and disk performance. Just FYI, in case anyone was curious the degree of effect it could have on throughput.

That said, this isn't a block clone operation, so I'm not saying this fixes whatever happens at those times - just thought I'd share my observations about general disk performance from having all these settings enabled.

Post by **tsightler** » Apr 06, 2017 6:35 pm this post

Yeah, with block clone it's pretty expected that the disk will be fragmented as that's simply the nature of clone blocks from one file to another, I'm not even sure what a defrag could do as defraging one file will likely just fragment another. I wouldn't expect Option 3 to be so bad for the normal sequential write and/or block clone operations, but I'll try to get access to some good lab hardware and run through the paces in the next 4 weeks or so.

j.forsythe · Post by **j.forsythe** » Apr 07, 2017 6:30 am this post

Hello guys.

My server is running on a single CPU with 32 GB of RAM. During backup I have seen 30% CPU usage and about 75% of RAM usage.
I have my Repo's limited to 4 concurrent tasks.
I have reconfigured my jobs again for tonight, so they will write the backups to a NTFS volume. I want to see if a Full-Backup with about 6-7 TB will take 50+ hours again.

Sorry if I ask my question again, but could it be that using Storage Spaces is causing an issue as well

I have set my HP RAID controller to HBA mode, so it passes all disk to Windows and used Storage Spaces on the 2016 Std server to create a storage pool.
Do you think I would get a benefit, if I would use the HP controller to create a new RAID and create a ReFS volume on top of that?

Thanks,
John

graham8 · Post by **graham8** » Apr 07, 2017 5:16 pm this post

graham8 wrote:having all three of those options in place is definitely aggressively limiting both ReFS driver memory usage and disk performance

Sorry, I think I spoke too soon. Defrag is apparently highly variable... I'm seeing higher speeds on the slow server now...100-150MB/s right now. I guess it depends on what chunks of data it's trying to move around...makes sense. Some swaths will be more random, fragmented IO than others. Still lower speeds than the one without those 3 options set, but significantly faster than I was initially seeing. Active ReFS memory usage is still very low on it.

adruet · Post by **adruet** » Apr 10, 2017 7:47 am this post

j.forsythe wrote:Hello guys.

My server is running on a single CPU with 32 GB of RAM. During backup I have seen 30% CPU usage and about 75% of RAM usage.
I have my Repo's limited to 4 concurrent tasks.
I have reconfigured my jobs again for tonight, so they will write the backups to a NTFS volume. I want to see if a Full-Backup with about 6-7 TB will take 50+ hours again.

Sorry if I ask my question again, but could it be that using Storage Spaces is causing an issue as well
I have set my HP RAID controller to HBA mode, so it passes all disk to Windows and used Storage Spaces on the 2016 Std server to create a storage pool.
Do you think I would get a benefit, if I would use the HP controller to create a new RAID and create a ReFS volume on top of that?

Thanks,
John

Based on my HP Hardware, 4 servers like this:
- HP DL380 Gen9 with dual CPU Intel E5-2660 v4 2Ghz, 64 GB of RAM, raid 1 SSD for the OS, and 2 NVMe 800GB disks
- Dual 10 Gbit network cards (HP 560FLR) supporting the offloading of SMB v3 (RDMA capabale)
- 2 x DAS HP D3700 with 25 x 1.8TB 12G 10K SAS disks configured as Raid 6 with HP p441 controller

I have done some storage spaces (and Storage Spaces Direct) testing, and the results were not very promising in terms of performance.
When the annonce of the licensing being Windows Server Datacenter only, we dropped the idea of ever using storage spaces direct.
So we tried to use storage spaces localy, using the NVMe disks as journal disk to improve performance.
But comparing the results using a veeam backup profile with diskspd between our p441 controller in HBA mode with storage spaces and the NVMe as journal disks (write cache for the volume) and parity for the rest of the D3700 disks, and standard Raid 6 with the p441 controller, we decided to stick with the p441 and raid 6 as it was faster and less CPU consuming.
Regarding RAM usage, this is probably due to ReFS, and you can check that with Sysinternals RAMMap.

kubimike · Post by **kubimike** » Apr 10, 2017 5:14 pm this post

@aduret, take a look at the P841. I have the same setup as you, however I use the p440 for my OS/internal disks. The P841 I have attached to the D3700s. Its quite fast!

graham8 · Apr 10, 2017 6:39 pm

Latest update. Confirmed no backups/copies/etc were taking place, and deleted two 6.5TB VBK files from the server. As usual with ReFS, available disk space only slowly began to make itself available. With all three of the MS workaround options in place, memory usage for this operation climbed over 100%. Then the usual occurred...numlock stopped responding, mouse stopped moving, disk activity lights stopped. I did multiple rounds of initiating manual memory dumps. This time, unlike all the other times this has occurred, the problem isn't working itself out by disabling all disk-activity-related services/tasks/etc (Veeam, server shares, scheduled data integrity scans, etc). Within 2-3 minutes, the server becomes unresponsive now with each boot cycle.

Updated Microsoft, but unless they come back to us with some way to set the volume read-only so that it temporarily stops whatever bug is occurring (even if it means it doesn't free the disk space) so we can recover the data from the volume, then it looks like we have permanently lost backup history and will need to nuke this and put in some completely different solution. And again, the volume itself is fine - the data is all accessible...just only for 2-3 minutes until the ReFS driver nukes the server.

I'll submit the memory dumps to Microsoft, so hopefully that at least helps them towards a long-term resolution to the underlying bug.

alesovodvojce · Post by **alesovodvojce** » Apr 10, 2017 7:34 pm this post

My thoughts are with you, @graham8. Would it help if you disattach the ReFS harddisk from VM, let Windows boot, and then later hot-plug it? It might be a helper in our deadlock scenario (but among dozens things we tried I'm unsure which one helped).

Thanks for submitting the memdumps to MS - that means also hope for us.

graham8 · Post by **graham8** » Apr 10, 2017 7:35 pm this post

alesovodvojce wrote:Would it help if you disattach the ReFS harddisk from VM, let Windows boot, and then later hot-plug it?

Thanks for the thoughts and condolences (lol). I did try this, though. Unfortunately, it didn't help this time.

alesovodvojce · Post by **alesovodvojce** » Apr 10, 2017 7:51 pm this post

few more thougts: 1) add more RAM to VM 2) to get the data out, mount the Refs VHD to Windows 2012 VM expecting less features with less bugs

graham8 · Post by **graham8** » Apr 10, 2017 7:59 pm this post

alesovodvojce wrote:few more thougts: 1) add more RAM to VM 2) to get the data out, mount the Refs VHD to Windows 2012 VM expecting less features with less bugs

Thanks. It's not a VM, though. This is a Veeam copy repository server, so we're just talking about physical VBK/VIM files and all. Adding more ram would maybe bandaid it, but I'd still feel like the whole setup was incredibly fragile and prone to crash and burn at any moment, and if I switch away from ReFS and keep using Veeam (with NTFS), then I'd have to propose spending at least enough in additional drives to buy a new car due to the need for many, many additional drives to accommodate all the full vbks that don't share any data blocks for the point-in-time GFS retention points and all...maybe a totally viable option for some with broader budgets, but... or go with dedupe, but I'd be scared *that* would end up corrupting something (I heard about some 2016-specific dedupe issue on here, so...).

Oh well

If I hear anything from MS I'll continue to update.

R&D Forums

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Re: REFS 4k horror story

Who is online