by mkretzer » Wed Feb 01, 2017 7:01 am people like this post
posted several threads already in the last few days but i have to post again about what happened to us this night.
First of all right now we are in the middle of migrating to REFS repos. We made the error to use 4k blocks on our temporal 120 TB repo. We thought it is no bug deal as it seemed to impact performance of file operations only at first. We monitored memory and cpu usage and did not see the memory preasure others saw because the system is gladly oversized. So we continued to successfully migrate to the new repos.
All went good for a few days, we have to wait 28 days so we can format our "production" backup storage and we were optimistic that we would "survive" that time because of the REFS space savings.
Then i got a message from our monitoring system this night. Our Veeam server was completely unreachable. I went on-site and found that i can move the mouse but not much more. I had to do a hard reset. After the system came up i saw that it tries to create 3 synthetic fulls at the same time, do a tape backup and some copy jobs. All in all nothing unusual - this worked well the nights before. So i disabled the tape job, enabled a limit of 12 concurrent tasks on the repos (before there was no limit) to regulate the load a little bit and drove back home.
10 Minutes later the next alert came in - so we had another crash. So i drove back to the company, did a hard reboot and then limited the REFS repos to 1 concurrent task so that at least our BCJs can finish at some point in the future and started to roll back to our old NTFS repository - with active fulls which i have to do for 1600 machines/140 TB.
Opening a explorer window on the REFS volume takes half a minute even without any load now so it is definately the REFS volume which has issues...
BTW i opened a sev1 case with MS - no response yet....
We can also confirm that, though memory usage seemed better at first, the patch does not solve the problem. Even our 64KB formatted 64TB luns are seeing these symptoms. Performance is very poor as well.
by mkretzer » Thu Feb 02, 2017 1:23 pm people like this post
Robvil wrote:I am also migrating to Refs. So reading the forum, i asume it is absolut best to use 64k volumes and stay away from 4k?
And from all i have read in the past 36 hours you should test it really good before you throw all your backups on it... In our case all looked great up until there were a bigger number of files on the disk...
by mkretzer » Thu Feb 02, 2017 2:11 pm people like this post
Ok i just got a very long email from Microsoft with alot of links where the general recomendation is "use NTFS because REFS has a many limitations". Only one thing was diretly targeted at our situation:
"You should avoid volumes bigger than 64 TB". I find this pretty bad because SOBR is not for us at the moment because we also had some issues with per-VM. And right now we have quite big backup files... For us, a bigger volumes is a must-have right now, if we split our 200 TB backup repo in 4 REFS repos we might loose alot of the REFS space saving benefits...
by Gostev » Thu Feb 02, 2017 8:38 pm 1 person likes this post
Markus, can you share Microsoft support case ID where this was stated? I wonder if the development team behind ReFS agrees with this statement, or perhaps this is an opinion of the specific support engineer who is simply trying to close the case, as this often happens the best way to find out is to ask the dev team behind ReFS directly - which I can easily do. Thanks!
by Skyview » Fri Feb 03, 2017 3:21 am people like this post
perhaps he meant avoid >64TB partitions *while using 4k cluster size* ? Because that, while not directly, sort of lines up with the inertia that veeam and microsoft have about using 64k for really large volumes.