-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: Windows 2019, large REFS and deletes
Yes, we upgraded our SAC (as we do with LTSC) in place - no issues whatsoever!
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
@spiritie
If you raise a support case, you can reference to our case 04428636.
We still have exactly the same problems with merge processes taking forever since a while. We try to break it down with veeam support and MS ReFS devs, but no success so far.
If you raise a support case, you can reference to our case 04428636.
We still have exactly the same problems with merge processes taking forever since a while. We try to break it down with veeam support and MS ReFS devs, but no success so far.
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: Windows 2019, large REFS and deletes
Out of curiosity, is anyone else using Tiering?
We've found this to be extremely valuable.
The three systems I've deployed recently include:
We've found this to be extremely valuable.
The three systems I've deployed recently include:
- SS Mirror Accelerated Parity - Veeam primary backup drive
[SSD Tier - Mirror](2)1.92TB Read Intensive SATA (1.64TiB usable)
[HDD Tier - Parity 3-column](10) 8TB NL-SAS (48.4TiB usable - 72.69 TiB raw @ 66.67% efficiency)
[SSD Simple Space[3]1.92TB RI SATA (5.23TiB usable) -- *Note* this is used for replicas - there's a small risk using a Simple (Raid-0) but as this is by design only 'most recent' data for a handful of servers, it's an acceptable risk vs cost; worse case is during a DR situation the volume fails, and we fall back to our backups via instant recovery - SS Mirror Accelerated Parity - Veeam primary backup drive & replication drive
[SSD Tier - Mirror](2)1.92TB MU SAS & (4) 960GB MU SAS (3.38TiB usable)
[HDD Tier - Parity 5-column](15) 2.4TB 10k-SAS & (20) 1.8TB 10k-SAS (52.32TiB usable - 65.4TiB raw @ 80% efficiency) - SS Mirror Accelerated Parity - Veeam primary backup drive & replication drive
[SSD Tier - Mirror](4) 480GB MU SAS (1.2TiB usable)
[HDD Tier - Parity 5-column](10) 2.4TB 10k-SAS (14.51TiB usable - 18.13TiB raw @ 80% efficiency)
- Doing daily backups of incremental sizes, means even Read Intensive (most affordable) SSD are well within the "WPD" of ~0.6-0.8 most RI drives list at - even doing replication every couple of hours doesn't push this
- Synth Full is done quicker we've found than on pure HDD ReFS - this is likely because ALL writes hit the SSD tier, and we try to size it so the SSD tier 'mostly' holds all of the weekly incremental data
- Even if the SSD tier is filled, we've not seen performance less than 200MB/s, and normally closer to 300-400MB/s sequential
- Gives performance of SSD systems while allowing capacity drives for the older data
- We set ReFS Tiering to 70% destage and at times lower - so once it hits 70% it destages data off the SSD tier to HDD tier - leaving 30% is important to handle influx of abnormal backup sizes, and because using block cloning (i.e. fastclone) on a full SSD tier (>90%) we have found can crash the entire StorageSpace... Microsoft wouldn't own up to why this was, so in production we only do Mirror/Mirror vs Mirror/Parity (destaging to parity is ... crap, Microsoft fails at parity calculations)
- MEMORY STREAM BANDWIDTH IS IMPORTANT - I can't stress this enough on ANY system, but more so when you're asking Microsoft to do Parity calculations. With new Xeon Scalables, this means using 12 or 24 DIMMS in the system, to achieve as optimal CPU to RAM performance as you can get. Older procs have this too in different metrics; i.e. many E5 Xeons optimal was 16 DIMMS (quad channel)
- For larger systems than this IMHO, QLC SSD is becoming a more realistic option - when replacing our larger repository we found that it was cheaper to buy that, than to do a MAP system because backup performance scales non-linearly. This is an odd fact, but as you get larger datasets, the slower backend drives are going to hurt your overall performance big time (which is why on two of these we're using 10k and not NL-SAS; and because we are repurposing system hardware)
- NOTE--Microsoft recommends a larger SSD Tier than we're using; I believe they recommend 10%-15% of the HDD tier - so for a 60TiB HDD tier you SHOULD run 6TiB of SSD - with SSD prices falling this should not be difficult to accomplish on a modest budget
-
- Service Provider
- Posts: 2
- Liked: never
- Joined: Nov 23, 2020 7:26 pm
- Contact:
Re: Windows 2019, large REFS and deletes
@Dasfliege, any luck with your issues? I have pretty much the same issues going on.
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
@doubleaapter
Not yet. MS doesn't see any misbehavior on the ReFS side while Veeam is saying that they do everything correct. It's quite hard to get that solved with only basic knowledge of what is going on in the background and two vendors involved, that both can't locate the problem on their side.
In your case, did the problem also appeared from one day to another? Is it also only affecting merges?
Not yet. MS doesn't see any misbehavior on the ReFS side while Veeam is saying that they do everything correct. It's quite hard to get that solved with only basic knowledge of what is going on in the background and two vendors involved, that both can't locate the problem on their side.
In your case, did the problem also appeared from one day to another? Is it also only affecting merges?
-
- Service Provider
- Posts: 2
- Liked: never
- Joined: Nov 23, 2020 7:26 pm
- Contact:
Re: Windows 2019, large REFS and deletes
So background,
We run C7000 blade enclosures, nimble HF60 dedupe SANs, vmware, everything multiple 10gbps interfaces etc. Veeam 10.0.1.4854
Our veeam cloud connect infra is all single-function VMs sliced off dual socket 12-core xeon blades (4). We attach disks as vmdks to repo VMs (usually 2 60TB ReFS disks per VM, each VM has 32GB RAM and 4 vCPU). The merge performance since we switched to 2019 1809 has been bad. Done all the registy stuff, no AV, exclusions for defender, etc.. When we moved from 2016 to 2019 we attached the old disks to the new VM and the ReFS automatically upgraded. I dont know if this is part of the problem or what. I'm just downloading the merge test tool and will run the speed tests on fresh repo vs not.
Our merges broke sort of random throughout our environment over the last 2 months it seems that it all coincided with upgrading to 2019. Ours arent just slow for some of our jobs we resorted to fresh starts on new repo VMs entirely because they don't work and fail.
They don't seem to actually use resources (RAM isnt crazy, CPU isnt), they just run insanely slow. I've watched the performance on the SAN, and its bored, low latency, not doing anything. We don't have any problems with performance on writes, only on merges and because they fail our backup size is slowly growing.
We have some old jobs that weren't per-vm, so we said screw it and started em fresh on 2019, ill see if thats any better otherwise im gonna have to revert to 2016.
We run C7000 blade enclosures, nimble HF60 dedupe SANs, vmware, everything multiple 10gbps interfaces etc. Veeam 10.0.1.4854
Our veeam cloud connect infra is all single-function VMs sliced off dual socket 12-core xeon blades (4). We attach disks as vmdks to repo VMs (usually 2 60TB ReFS disks per VM, each VM has 32GB RAM and 4 vCPU). The merge performance since we switched to 2019 1809 has been bad. Done all the registy stuff, no AV, exclusions for defender, etc.. When we moved from 2016 to 2019 we attached the old disks to the new VM and the ReFS automatically upgraded. I dont know if this is part of the problem or what. I'm just downloading the merge test tool and will run the speed tests on fresh repo vs not.
Our merges broke sort of random throughout our environment over the last 2 months it seems that it all coincided with upgrading to 2019. Ours arent just slow for some of our jobs we resorted to fresh starts on new repo VMs entirely because they don't work and fail.
They don't seem to actually use resources (RAM isnt crazy, CPU isnt), they just run insanely slow. I've watched the performance on the SAN, and its bored, low latency, not doing anything. We don't have any problems with performance on writes, only on merges and because they fail our backup size is slowly growing.
We have some old jobs that weren't per-vm, so we said screw it and started em fresh on 2019, ill see if thats any better otherwise im gonna have to revert to 2016.
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
Okay. That sounds familiar. We also had (automatically) updated our ReFS repos when we switched to 2019. And also all other symptoms you describe correspond exactly to our observations.
Please let me know the output of the block-clone-spd test with 50GB filesize. If values are bad and you have a chance to test it from another VM (maybe 2016) to the same SAN storage, that may would be a nice isolation test to locate the bottleneck.
Unfortunately i can't do that test, as we use JBOD shelfs directly attached to our 2019 backupserver.
Please let me know the output of the block-clone-spd test with 50GB filesize. If values are bad and you have a chance to test it from another VM (maybe 2016) to the same SAN storage, that may would be a nice isolation test to locate the bottleneck.
Unfortunately i can't do that test, as we use JBOD shelfs directly attached to our 2019 backupserver.
-
- Service Provider
- Posts: 10
- Liked: 5 times
- Joined: May 19, 2016 3:45 pm
- Full Name: Bryan Buchan
- Contact:
Re: Windows 2019, large REFS and deletes
Hi Gostev, I have seen you mention we should be using synthetic fulls when using ReFS fast cloning before, but for the life of me, I cannot figure out why? Can you please explain why you suggest this?
Background info: We utilize ReFS both on our client side local storage (iSCSI NASs) as well as our service provider side. The service provider side storage is made up of multiple physical servers dedicated as repositories running Server 2019 (patched, AV exclusions, and registry optimizations, basically every suggestion we have found here) hosting 3x50TB volumes with 256GB of RAM and 10Gbps backbone networking. Straight from our hosting environment these servers can ingest data at full 10Gbps but when it comes to ReFS fast clones, the speed is all over the place. Small merges can take minutes or hours regardless of per VM or per job files and regardless of incremental size. During merges, we see no resource contention such as RAM or CPU, just slow and in some cases slower than regular merges. Our goal is to maintain as small of foot print per cloud connect tenant as possible, so enabling synthetic fulls would significantly increase that footprint.
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
Most customers do synthetic fulls because it is required to have GFS backup schedule. And synthetic fulls are "free" with ReFS, what do you mean by "enabling synthetic fulls would significantly increase that footprint"?
-
- Service Provider
- Posts: 10
- Liked: 5 times
- Joined: May 19, 2016 3:45 pm
- Full Name: Bryan Buchan
- Contact:
Re: Windows 2019, large REFS and deletes
Ok, I guess I didn't realize that standard (non GFS) synthetic fulls would also result in spaceless fulls. I understand that synthetic fulls are the basis for GFS (my GFS points do not appear to be spaceless however...) but I believe in the past I have seen you recommend it just in general. We currently utilize forever-forward incremental chains on all our customers. In regards to using ReFS fast cloning, what advantage would I gain by enabling periodic synthetic fulls on chains that do not have GFS enabled.
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
My thought was simply that periodic synthetic fulls is the default backup job setting, so this is what most of our customers are using - and we can be sure this pattern is not problematic for ReFS. While you have a different configuration and having some issues with ReFS, so I was wondering if this could be connected.
-
- Enthusiast
- Posts: 46
- Liked: 12 times
- Joined: Apr 10, 2018 2:24 pm
- Full Name: Peter Camps
- Contact:
Re: Windows 2019, large REFS and deletes
We were able to do some testing on another HPE Apollo system at HPE. We noticed the following behaviour.
When we perform the following command on our Apollo 4200 (26 data drives Raid 60, 256 GB RAM, 218 TB ReFS-64k volume);
block-clone-spd.exe S:\Temp 50
We see the following symptoms in the Task Manager / Resource Monitor / Memory tab.
While the files 01.data and 02.data are written to disk the modified memory is starting to rise from 0 to around 76000 MB. Also the Standby Memory is filled until there is only 3 MB free memory left.
When both files are written, the Modified Memory is painfully slow going back down, at that time it is almost impossible to do anything on that volume. Creating or deleting a directory will cause a not responding explorer. This takes several minutes.
At the moment the Standby Memory is starting to decrease the explorer starts reacting again. When the Modified Memory is down to around 0 the cloned.data file is created.
The average speed is around 750 MiB/s.
On the testsystem we have at HPE, Also a HPE Apollo 4200 (only with 18 drives, Raid 60, 128 GB RAM, 130 TB ReFS-64k volume), if we do the same test, the modified memory never gets above 14000 MB. When writing 01.data and 02.data has finished the modified memory is going down to about 0 in just a few seconds. Also the Standby Memory is going down fast and the cloned.data is being created. The explorer keeps responding fine.
In the end it scores around 6000 MiB/s.
We did not notice any difference using Windows 2016 1607 or Windows 2019 1809 (17763.1613)!! We were able to test with both versions, but there was no difference in performance.
The only major difference between the 2 Apollo's is the memory (128 vs 256 GB) and the ReFS volume size (130 TB vs 218 TB). We are trying to see if it is possible to expand the test Apollo to the same amount of disks and test with a bigger volume.
Does anybody see the same symptoms with the Modified Memory amount and unresponsiveness of the explorer?
When we perform the following command on our Apollo 4200 (26 data drives Raid 60, 256 GB RAM, 218 TB ReFS-64k volume);
block-clone-spd.exe S:\Temp 50
We see the following symptoms in the Task Manager / Resource Monitor / Memory tab.
While the files 01.data and 02.data are written to disk the modified memory is starting to rise from 0 to around 76000 MB. Also the Standby Memory is filled until there is only 3 MB free memory left.
When both files are written, the Modified Memory is painfully slow going back down, at that time it is almost impossible to do anything on that volume. Creating or deleting a directory will cause a not responding explorer. This takes several minutes.
At the moment the Standby Memory is starting to decrease the explorer starts reacting again. When the Modified Memory is down to around 0 the cloned.data file is created.
The average speed is around 750 MiB/s.
On the testsystem we have at HPE, Also a HPE Apollo 4200 (only with 18 drives, Raid 60, 128 GB RAM, 130 TB ReFS-64k volume), if we do the same test, the modified memory never gets above 14000 MB. When writing 01.data and 02.data has finished the modified memory is going down to about 0 in just a few seconds. Also the Standby Memory is going down fast and the cloned.data is being created. The explorer keeps responding fine.
In the end it scores around 6000 MiB/s.
We did not notice any difference using Windows 2016 1607 or Windows 2019 1809 (17763.1613)!! We were able to test with both versions, but there was no difference in performance.
The only major difference between the 2 Apollo's is the memory (128 vs 256 GB) and the ReFS volume size (130 TB vs 218 TB). We are trying to see if it is possible to expand the test Apollo to the same amount of disks and test with a bigger volume.
Does anybody see the same symptoms with the Modified Memory amount and unresponsiveness of the explorer?
-
- Expert
- Posts: 130
- Liked: 14 times
- Joined: Mar 20, 2018 12:47 pm
- Contact:
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
Importantly, this is not Microsoft talking, but rather one particular support engineer who perhaps does not know the subject too well, if he/she thinks the issue ReFS volumes turning raw is caused by Veeam or DPM. This is a well known ReFS issue caused by data loss due to RAID controller bugs (or due to using RAID controllers which are not on Windows HCL).
Also, suggestion to go from 2016 to 2019 is questionable: it's not like 2016 has major known issues at this stage which are resolved by 2019. I would understand an advice to go to the latest SAC, which has materially different ReFS code base coming from Server 2021, and shows great performance gains thanks to it (easily a few times faster).
Also, suggestion to go from 2016 to 2019 is questionable: it's not like 2016 has major known issues at this stage which are resolved by 2019. I would understand an advice to go to the latest SAC, which has materially different ReFS code base coming from Server 2021, and shows great performance gains thanks to it (easily a few times faster).
-
- Expert
- Posts: 130
- Liked: 14 times
- Joined: Mar 20, 2018 12:47 pm
- Contact:
Re: Windows 2019, large REFS and deletes
Anton,
Then explain me please what the situation desribes your KB and when it should be done: https://www.veeam.com/kb3136 ?
Then explain me please what the situation desribes your KB and when it should be done: https://www.veeam.com/kb3136 ?
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
Yes, I sent some inquiries regarding this article to our support KB folks already, and waiting for their comments. Because as it stands right now, this Veeam KB article recommends migrating a fairly solid and stable Server 2016 ReFS installs (even if not the fastest) to a totally unworkable setup! The whole reason this forum thread exists is that Server 2019 had severe issues with ReFS up until KB4541331, which was released a few months after KB4534321 that this article recommends using.
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
We will remove this KB completely. It was based on a support case on Server 2016 from back from early January, where Microsoft support engineer swiftly recommended upgrading to Server 2019. Which clearly means no actual research of the issue or OS performance comparison was done, because ReFS in Server 2019 was a complete disaster at the time: it had to receive two major patches in the months following the recommendation. And until the second patch, most of our customers were actually moving BACK from 2019 to 2016 to fix their ReFS issues, as you can see from first pages of this thread.
For those who are currently not happy with the ReFS performance whether on 2016 or 2019, the only way to achieve a meaningful improvement is to upgrade to SAC version 1909 or later, where the ReFS metadata handling engine received massive changes. However, if the performance is more or less acceptable (meaning you fit your backup window) and/or you don't want to mess with SAC, then I'd recommend just putting this whole issue aside until Windows Server 2021 is out. This one will make the real difference.
For those who are currently not happy with the ReFS performance whether on 2016 or 2019, the only way to achieve a meaningful improvement is to upgrade to SAC version 1909 or later, where the ReFS metadata handling engine received massive changes. However, if the performance is more or less acceptable (meaning you fit your backup window) and/or you don't want to mess with SAC, then I'd recommend just putting this whole issue aside until Windows Server 2021 is out. This one will make the real difference.
-
- Service Provider
- Posts: 45
- Liked: 5 times
- Joined: Nov 08, 2013 2:53 pm
- Full Name: Bert D'hont
- Contact:
[MERGED] KB3136 - Windows Server 2016 stops responding during synthetic operations on a ReFS repository
Hello,
There used to be a KB (KB3136) with title Windows Server 2016 stops responding during synthetic operations on a ReFS repository.
It was published in March 2020 and modified in August 2020.
KB told that this behavior was a limitation of ReFS and it was confirmed by Mirosoft.
Solution was to use Windows Server 2019.
I see now that the KB isn't available anymore..
What is the reason this KB isn't there anymore? Is there a solution for ReFS with Windows Server 2016?
The problem is that we use HPE StoreEasy as backup repository. This device comes preinstalled with Windows Storage Server 2016.
I asked HPE support, and currently it is not supported to upgrade this devices to Windows Storage Server 2109...
There used to be a KB (KB3136) with title Windows Server 2016 stops responding during synthetic operations on a ReFS repository.
It was published in March 2020 and modified in August 2020.
KB told that this behavior was a limitation of ReFS and it was confirmed by Mirosoft.
Solution was to use Windows Server 2019.
I see now that the KB isn't available anymore..
What is the reason this KB isn't there anymore? Is there a solution for ReFS with Windows Server 2016?
The problem is that we use HPE StoreEasy as backup repository. This device comes preinstalled with Windows Storage Server 2016.
I asked HPE support, and currently it is not supported to upgrade this devices to Windows Storage Server 2109...
-
- Veteran
- Posts: 1143
- Liked: 302 times
- Joined: Apr 27, 2020 12:46 pm
- Full Name: Natalia Lupacheva
- Contact:
Re: Windows 2019, large REFS and deletes
Hi Bert,
Moved your post to the existing thread.
Please take a look at the post above, it describes why this KB was removed.
Thanks!
Moved your post to the existing thread.
Please take a look at the post above, it describes why this KB was removed.
Thanks!
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
@PeterC
I can observe the same behavior for modified and standby memory. Did you had any chance to do further testing?
Microsoft is asking me for dumps and logs from two different systems. One which is performing well and the other one with poor performance. As you already have this setup in place, i would be very thankful if you may could assist us by collecting and providing the requested stuff to MS for further analysis.
Also, it seems like veeam isn't really aware that there definately is an issue again. Did anyone encountering the problem opened a case with veeam so far? If not, please do it and reference to my case 04428636. We have to make sure that all appearences of that problem are logged, so we can consolodiate our ressources to find a solution. Any help appreciated.
I can observe the same behavior for modified and standby memory. Did you had any chance to do further testing?
Microsoft is asking me for dumps and logs from two different systems. One which is performing well and the other one with poor performance. As you already have this setup in place, i would be very thankful if you may could assist us by collecting and providing the requested stuff to MS for further analysis.
Also, it seems like veeam isn't really aware that there definately is an issue again. Did anyone encountering the problem opened a case with veeam so far? If not, please do it and reference to my case 04428636. We have to make sure that all appearences of that problem are logged, so we can consolodiate our ressources to find a solution. Any help appreciated.
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
One more thing:
We're full service provider (outsourcer) for a bunch of customers and therefore i have quite a lot (30-40) different instances of backupserver where i can do tests. I now performed the block-clone-spd on like 4-5 different installations and none of them reached more then 900MB/s. Even on one installation with server 2016 and 36x NL-SAS disks directly attached, i wasn't able to get above that value. That seems quite suspicious to me. One thing all these installations do have in common, is that they are running on HPE hardware. Eighter physically or as a VM.
I've found one single installation running on a supermicro server with 12xSATA synology repository, which was able to reach >6000MB/s. However, that syno does have 2TB M.2 read/write cache. Don't know if it maybe just performs that much better because of the cache or if it really is related to the fact, that it isn't a HPE hardware.
What brand of HW are others using that encounter performance drops? I know from PeterC that you're using HPE too, even though it's appollo in your case and not proliant as we do mostly.
We're full service provider (outsourcer) for a bunch of customers and therefore i have quite a lot (30-40) different instances of backupserver where i can do tests. I now performed the block-clone-spd on like 4-5 different installations and none of them reached more then 900MB/s. Even on one installation with server 2016 and 36x NL-SAS disks directly attached, i wasn't able to get above that value. That seems quite suspicious to me. One thing all these installations do have in common, is that they are running on HPE hardware. Eighter physically or as a VM.
I've found one single installation running on a supermicro server with 12xSATA synology repository, which was able to reach >6000MB/s. However, that syno does have 2TB M.2 read/write cache. Don't know if it maybe just performs that much better because of the cache or if it really is related to the fact, that it isn't a HPE hardware.
What brand of HW are others using that encounter performance drops? I know from PeterC that you're using HPE too, even though it's appollo in your case and not proliant as we do mostly.
-
- Expert
- Posts: 246
- Liked: 58 times
- Joined: Apr 28, 2009 8:33 am
- Location: Strasbourg, FRANCE
- Contact:
Re: Windows 2019, large REFS and deletes
On my side, @ one of my customers.
Apollo 4200 Gen9 system with 24x6To raid60 (10+2 and 10+2) 110TB volume 40TB Free Windows 2016
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 53883.613 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "d:\test\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "d:\test\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "d:\test\cloned.data" via block cloning...
All block cloning took 6.640s.
Average speed: 7710.541 MiB/s
Apollo 4200 Gen9 system with 24x6To raid60 (10+2 and 10+2) 110TB volume 40TB Free Windows 2016
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 53883.613 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "d:\test\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "d:\test\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "d:\test\cloned.data" via block cloning...
All block cloning took 6.640s.
Average speed: 7710.541 MiB/s
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Windows 2019, large REFS and deletes
I have a few large (100-300TB) repositories (HPE and SuperMicro hardware) that only slow down when allowed tasks per repository is too large or unlimited. 2-3 times the number of (data) spindles seems to be the limit. For example I have a 18-disk RAID6 (for reasons...) that I'd limit to 36 (2x spindles) to 48 tasks (3x spindles minus parity). Another 24-disk RAID60 would be 48 to 60. Both have SSD caches.
If unlimited (possibly hundreds) of parallel block clones start, IO can just stops for minutes. I don't know if there is an universal upper limit, but my task limits have mitigated the problem.
I think I have block-clone-spd somewhere from some historical tickets but performance is satisfactorary for the hardware as-is. All WS2019
If unlimited (possibly hundreds) of parallel block clones start, IO can just stops for minutes. I don't know if there is an universal upper limit, but my task limits have mitigated the problem.
I think I have block-clone-spd somewhere from some historical tickets but performance is satisfactorary for the hardware as-is. All WS2019
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
We have 60 Disks Raid1 (Storage Space Mirror). Task limit is set to 40. I guess that shouldn't be a problem. I already tried higher and lower values, but it doesn't really make a difference.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Windows 2019, large REFS and deletes
Just a question for you, since you specifically mention that the SuperMicro server has a write cache, are you saying that your Apollo servers are using RAID controllers that don't have write-back cache or just that it's much smaller?dasfliege wrote: ↑Dec 15, 2020 11:19 am One more thing:
We're full service provider (outsourcer) for a bunch of customers and therefore i have quite a lot (30-40) different instances of backupserver where i can do tests. I now performed the block-clone-spd on like 4-5 different installations and none of them reached more then 900MB/s. Even on one installation with server 2016 and 36x NL-SAS disks directly attached, i wasn't able to get above that value. That seems quite suspicious to me. One thing all these installations do have in common, is that they are running on HPE hardware. Eighter physically or as a VM.
I've found one single installation running on a supermicro server with 12xSATA synology repository, which was able to reach >6000MB/s. However, that syno does have 2TB M.2 read/write cache. Don't know if it maybe just performs that much better because of the cache or if it really is related to the fact, that it isn't a HPE hardware.
-
- Service Provider
- Posts: 277
- Liked: 61 times
- Joined: Nov 17, 2014 1:48 pm
- Full Name: Florin
- Location: Switzerland
- Contact:
Re: Windows 2019, large REFS and deletes
@tsightler
The supermicro server uses a repository which is located on a Synology NAS and connected via iSCSI. The Synology has 2TB read/write cache on M.2 SSD. So if i do tests with 50GB files, they may will be processed entierly in the cache and never written down to the spindle disks, which will of course explain why its performing much better. Our other servers do have the normal write cache located on Smart Array Raid controllers, which is usually way smaller.
The supermicro server uses a repository which is located on a Synology NAS and connected via iSCSI. The Synology has 2TB read/write cache on M.2 SSD. So if i do tests with 50GB files, they may will be processed entierly in the cache and never written down to the spindle disks, which will of course explain why its performing much better. Our other servers do have the normal write cache located on Smart Array Raid controllers, which is usually way smaller.
-
- Enthusiast
- Posts: 46
- Liked: 12 times
- Joined: Apr 10, 2018 2:24 pm
- Full Name: Peter Camps
- Contact:
Re: Windows 2019, large REFS and deletes
@dasfliegedasfliege wrote: ↑Dec 15, 2020 7:45 am @PeterC
I can observe the same behavior for modified and standby memory. Did you had any chance to do further testing?
Microsoft is asking me for dumps and logs from two different systems. One which is performing well and the other one with poor performance. As you already have this setup in place, i would be very thankful if you may could assist us by collecting and providing the requested stuff to MS for further analysis.
Also, it seems like veeam isn't really aware that there definately is an issue again. Did anyone encountering the problem opened a case with veeam so far? If not, please do it and reference to my case 04428636. We have to make sure that all appearences of that problem are logged, so we can consolodiate our ressources to find a solution. Any help appreciated.
We have tested at HPE with a similar Apollo and Server 2019, the performance was a lot better. We now have opened a case with HPE - HPE Support Case 5352160683. They have collected several logs from our Apollo and the test-Apollo. At this moment they have not found anything peculiar. After the holidays it will be elevated to the next level, to see if they can find anything.
We have had a case with Veeam (Case # 04331174) and a case with Microsoft (REG:120081425000237) that is currently on hold, because they suspect it is hardware related.
You could refer to this MS case, they have received several files which they analyzed. Just let me know if you need anything else.
I will be away for a bit during the holidays, but will try to help if possible.
We have set the task limit on the Apollo to 30, so that definitely should not be the problem. I agree that there definitely is a problem, but I really can't tell what is the cause.
We also noticed that without any changes being made suddenly the performance gets worse, backups (merges) are taking longer than they used to.
-
- Veeam Vanguard
- Posts: 39
- Liked: 11 times
- Joined: Feb 14, 2014 1:27 pm
- Full Name: Didier Van Hoye
- Contact:
Re: Windows 2019, large REFS and deletes
@JRRW Yes, I use stand alone Storage Spaces & S2D to great effect with MAP. In some cases we put the destaging treshold to 50% and make the performance tier bigger than 20%. I have also done all flash where the performance tier get the better write intensive or mixed use drives and the capacity one the read intensive ones.
-
- Enthusiast
- Posts: 78
- Liked: 46 times
- Joined: Dec 10, 2019 3:59 pm
- Full Name: Ryan Walker
- Contact:
Re: Windows 2019, large REFS and deletes
@WorkingHardInIT I'm pretty sure I follow your twitter, incidentally.
I ran a test of All Flash SS and found it lacking for backup purposes vs a straight RAID of QLC drives. That having been said, there could very well be a design with something like an NVMe (m.2 or u.2) Performance Tier with QLC under it where it'd perform even better; in our use case however, we do such low DWPD that straight RAID to QLC is fine - we won't hit the QLC write limits despite being "Read Intensive" (7.t68tbx24). My bottleneck is Hyper-V and the underlying 8gbps FC - which is saying something considering it's running on an NVMe (or rather, 'directflash') Pure X50R2 - love being able to say that the bottleneck isn't storage on either end. (Well I suppose technically speaking, the FC is the HBA and thus storage, but... semantics)
For companies with heavier writes or running more than once a day backups, it's possible that isn't a good choice. Or if they have smaller SSD drives where they might overwrite an entire drive capacity.
All the same - to the topic at hand in this post - I've found 2019 ReFS to be great everywhere it's deployed in our org. Despite Microsoft supposedly dropping development for 'standard' one node storage spaces, they continue to tune ReFS and drivers across all deployments. More importantly, this allows far greater flexibility to deploy your own storage, vs buying an over-priced OEM solution.
The only thing missing vs a 'better' OEM solution: Global Deduplication. But unless you're running a FlashArray//C or a really high end DataDomain, you're going to sacrifice performance of restores and transforms; and both of those solutions = $$$$
I ran a test of All Flash SS and found it lacking for backup purposes vs a straight RAID of QLC drives. That having been said, there could very well be a design with something like an NVMe (m.2 or u.2) Performance Tier with QLC under it where it'd perform even better; in our use case however, we do such low DWPD that straight RAID to QLC is fine - we won't hit the QLC write limits despite being "Read Intensive" (7.t68tbx24). My bottleneck is Hyper-V and the underlying 8gbps FC - which is saying something considering it's running on an NVMe (or rather, 'directflash') Pure X50R2 - love being able to say that the bottleneck isn't storage on either end. (Well I suppose technically speaking, the FC is the HBA and thus storage, but... semantics)
For companies with heavier writes or running more than once a day backups, it's possible that isn't a good choice. Or if they have smaller SSD drives where they might overwrite an entire drive capacity.
All the same - to the topic at hand in this post - I've found 2019 ReFS to be great everywhere it's deployed in our org. Despite Microsoft supposedly dropping development for 'standard' one node storage spaces, they continue to tune ReFS and drivers across all deployments. More importantly, this allows far greater flexibility to deploy your own storage, vs buying an over-priced OEM solution.
The only thing missing vs a 'better' OEM solution: Global Deduplication. But unless you're running a FlashArray//C or a really high end DataDomain, you're going to sacrifice performance of restores and transforms; and both of those solutions = $$$$
Who is online
Users browsing this forum: Bing [Bot] and 75 guests