tsightler wrote:Of course, I can't guarantee that the problem will be resolved, but I can say that I have worked with literally dozens, perhaps 100's, of customers using ReFS, most at scales of 100's of TBs per server, and RAM was always an important factor in resolving lockups. I would never recommend less than 64GB of RAM for your setup and use case, so I'm very hopeful that will improve your situation. Please keep us posted and thanks for participating in the Veeam community!
Reporting back on the results of the RAM upgrade. I bumped the RAM in our repo server from 16GB up to 96GB. I then performed the same test that would consistently render the server unresponsive: I deleted (via Windows Explorer) a ~5TB .vbk file. Result: no server lockup. I then performed the “real world” test by changing the retention policy of our large file server backup job so that Veeam would attempt to delete old backup sets. Result: no server lockup.
Tom, it appears that increasing server memory (per your suggestion) resolved the problem that has been a thorn in my side for three months now. Thank you!
However, I am left wondering why it took so long to get here. I have an open ticket with Microsoft Pro Support. I have an open ticket with Veeam Support (I have worked this issue with Tier 1 and Tier 2 support people at Veeam). At no point did anyone at MS support or Veeam support suggest that increasing RAM would resolve my server lockup issue. They have had me collect and upload diagnostic files and event logs, tweak registry entries, adjust page file settings, and verify driver versions. Never a suggestion to increase memory.
I have a few suggestions, if I may:
1. Share this information with Veeam Support personnel. I would have tried increasing server memory months ago had someone in Veeam Support suggested it.
2. Create and maintain a sticky post for this forum thread. There are 70+ pages here, far too much for a busy network admin responsible for dozens of systems to parse through. If there are things that are known about these issues, and things that can help customers resolve problems, it would be of great benefit to Veeam customers to summarize in a sticky post on the first page of this thread. Had I seen a suggestion a sticky post that increasing repo server memory can resolve lockup issues, I would have tried that. It would have saved me time as well as Veeam Support personnel time.
3. Veeam should be working directly with Microsoft on these ReFS issues. IMO, Veeam should have the lab environment facilities to replicate these issues and subsequently share data with Microsoft toward understanding and resolving these problems. Asking customers to open a $500 support case with Microsoft Pro Support is bad form, IMO. I realize that Microsoft ultimately needs to provide the fix, but throwing the $500 at Microsoft has not got us a thing. They have been analyzing our server memory.dmp file for almost two months now. No suggested fixes came out of that. No mention of server memory having a relationship to system dependability.
I’m VERY happy that our problem is now resolved, I just wish it hadn’t taken three months to get here (yes, I think I first opened the ticket with Veeam Support on this issue around April 11).