REFS performance issue "workaround"

Availability for the Always-On Enterprise

REFS performance issue "workaround"

Veeam Logoby mkretzer » Mon Jun 26, 2017 10:01 am

Hello,

as stated in other threads we still have great REFS performance issues in the following situations:

- When data is deleted from the volume
- From a certain point on when there is much data on the disk that has been block cloned

Because of this we are in the middle of moving back to NTFS. To do this we took a tempral storage with half the capacity of or primary backup storage and created a new REFS on that storage. As before, REFS is extremly fast again. It will get slow after about 4 weeks (we did this before also on that storage).

I was thinking if there would be a way to live with the REFS limitations/bugs: Would if be an option to use a scale out repo with two backend storages in which one extend is always in maitenance mode (without evacuation) and every 3-4 weeks this is switched? From what i understand active fulls will be necessary then but when we use one SOBR per backup job (in different directories on the same two drives) it can be scripted when this switch and the active fulls should occour.

This way fast clone is really fast most of the month and the necesarry backup storage is the same or less than with NTFS.

Markus
mkretzer
Expert
 
Posts: 328
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS performance issue "workaround"

Veeam Logoby bjervis » Mon Jun 26, 2017 4:31 pm

You're having these issues with 64k block size? I have quite a few large refs volumes and haven't seen any of these issues yet *knock on wood*
bjervis
Service Provider
 
Posts: 105
Liked: 18 times
Joined: Fri Dec 18, 2015 1:30 pm
Full Name: Brad Jervis

Re: REFS performance issue "workaround"

Veeam Logoby mkretzer » Mon Jun 26, 2017 5:09 pm

Yes, 64 k. 192 TB volume size. Volume filled 70 % when the issue starts happening.
mkretzer
Expert
 
Posts: 328
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS performance issue "workaround"

Veeam Logoby Gostev » Mon Jun 26, 2017 9:42 pm

Yes, your idea will work fine. Just remember to enable the option to Perform full backup when required extent is offline. Thanks!
Gostev
Veeam Software
 
Posts: 21441
Liked: 2361 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS performance issue "workaround"

Veeam Logoby dellock6 » Tue Jun 27, 2017 8:25 pm

As we are trying to collect some field numbers, would you mind to share yours, specifically:
- size of the whole volume (192 TB)
- used size
- RAM size
- is the backup forever forward? What's the retention?
We have seen around and can confirm that the performance penalty is more visible during the deletion options, but we have also seen that in many situations, a good amount of memory fixes many issues. If possible, we are trying to define what this amount could be. We don't have algoritms or calculators, but based on people information, we may compile a list of "safe" configurations.
If you feel more comfortable sharing these info privately, send me a private message, we may be interested also in some server configuration detail, like brand, disks, raid, etc...
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 5061
Liked: 1342 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: REFS performance issue "workaround"

Veeam Logoby mkretzer » Wed Jun 28, 2017 4:23 am

- size of the whole volume: 192 TB
- used size: 140 TB
- RAM size: First 128 GB (with crashes) now 384 GB (since then extremly stable but slow after 3 weeks)
- is the backup forever forward?: No, Forward with weekly synthetics
- What's the retention?: 28 Days

One more thing: I postet a screenshot where you can see how much memory is used while a synthetic is running in another thread. I believe it fluctuated by ~130 GB in the time of a few minutes. So the RAM really seems to be what saved us from crashes.
mkretzer
Expert
 
Posts: 328
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS performance issue "workaround"

Veeam Logoby Delo123 » Wed Jun 28, 2017 8:47 am

For us:
Volume size: 189TB (backed by Adaptec Raid 60 30x8TB HGST SAS)
Used size: 149TB (277TB used when calculating size on disk, so synthetic fulls are around 50%)
Ram size: 384GB
Weekly synthetic fulls, Monthly active fulls
No retention policy (we keep all backups till volume is full and replace entire JBOD)

Not a single issue until now, backup speeds are also still around 500-700MB/s
Delo123
Expert
 
Posts: 351
Liked: 101 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: REFS performance issue "workaround"

Veeam Logoby mkretzer » Wed Jun 28, 2017 9:18 pm

Our environments sound VERY similar with the difference that we have three times the disks. Do I understand correctly you never delete anything? That is always when problems start for us and never really recover until we start from scratch...
mkretzer
Expert
 
Posts: 328
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS performance issue "workaround"

Veeam Logoby Gostev » Wed Jun 28, 2017 10:15 pm

Yes, the issue is most certainly connected with deleting the data from ReFS volume.
Gostev
Veeam Software
 
Posts: 21441
Liked: 2361 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Thu Jun 29, 2017 5:40 am

My problem is deleting large VBKs after a job runs. Everything else works. After deleting a 5TB VBK System sloooowwllllly freezes then won't do anything, doesn't BSOD.
- size of the whole volume 52TB
- used size 38TB
- RAM size 192GB
- is the backup forever forward? No 2 Backups a day one being a synthetic full daily.
- What's the retention? 200
server configuration detail, like brand, disks, raid, etc... HP DL380 , two D3700 connected SAS via P841(Dual domain mode), 50 1.2TB SFF 10k disks in RAID 6+0
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Thu Jun 29, 2017 5:52 pm

kubimike wrote:- is the backup forever forward? No 2 Backups a day one being a synthetic full daily.

Synthetic full daily? Can you explain the reasons for this choice? My lab experience says that, the more cloned blocks you have across files, the slower and more likely to hang the filesystem becomes, so I would think this would be a scenario more prone to problems.
tsightler
Veeam Software
 
Posts: 4801
Liked: 1759 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Thu Jun 29, 2017 6:54 pm

@tsightler, I am doing the same thing with my exchange backups as well. Except those files its creating is vastly smaller (1TB). I have no issues with veeam deleting the older restore points. Wouldn't every new VBK that synthetic full creates be an independent file from the previous VBK ? The reason Im doing it is running active fulls is time consuming. I do that once every quarter. I also don't like having large backup chains open. Every time a synthetic runs im closing that chain of backups, less chance of corruption. Now, being a new veeam user this is what i've collected from reading. I could be totally wrong. Also nowhere in the documentation does it say you can't do a synthetic full everyday either.
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Thu Jun 29, 2017 7:40 pm

@kubimike, nothing you say is technically incorrect, however, most of what you are reading is before ReFS and block clone.

With a traditional filesystem (NTFS for example), a synthetic full was an entirely new file so each new chain was totally independent of the other. This meant that starting a new full required enough space for the new full, but it also meant that the new full was completely separate from the old one, using completely different disk blocks, if something were to happen to the old chain, the new chain would not be impacted. But it's unlikely that you would choose to run a synthetic full every day, because if I was keeping 100 restore points, I'd need 100x more space, not to mention all of the time, so typically you would look to strike a balance between used space and chain length, which lead most people to run synthetics weekly at most.

ReFS drastically changes the equation, since we don't have to move blocks, so the time is highly reduced, but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks. This provides effectively no benefit from an integrity perspective, because the VBK you create today is still dependent on blocks from VBK files you created weeks/months ago.

But even worse, if you create a synthetic full every day, by day 100 you have 100 VBKs all sharing various blocks among each other, which is a massive amount of block referenses that ReFS itself has to keep track of in it's metadata. Compared to having weekly synthetics, you have 14x more cloned blocks, with no integrity benefit at all, and all evidence to this point says that ReFS problems are more likely to trigger the more reference blocks you have. For example, in my lab when I'm configuring to reproduce the ReFS hangs/BSODs, I configure my jobs to create a synthetic full every day exactly to create all of these block references so that I hit the issue more quickly.

So yes, while there's nothing in the users guide that says you can't do a synthetic full every day, the user guide is about what the capabilities of Veeam are, and Veeam certainly has the ability to create a synthetic full every day, and there may be cases where that is useful or a good idea, but with our current knowledge of ReFS, and what things trigger problems with it, I'd say it's less than ideal for that use case.
tsightler
Veeam Software
 
Posts: 4801
Liked: 1759 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Thu Jun 29, 2017 9:09 pm

@tsightler
Thanks for the explanation. One thing that isn't clear is below.
but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks.


That being said, if all the VBKs are related somehow, when veeam deletes the oldest restore point (due to retention) that would make your statement false ?

**edit would you like to do a remote session on my server tomorrow? I can give a tour..
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Fri Jun 30, 2017 1:53 am 1 person likes this post

kubimike wrote:[tsightler]but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks.

That being said, if all the VBKs are related somehow, when veeam deletes the oldest restore point (due to retention) that would make your statement false ?

But that's entirely what block clone is, each VBK is sharing any unchanged blocks from the prior VBK, when you delete the older VBK, that just means the blocks aren't shared anymore. Let's try a super simple example, three VBKs, each containing only 4 blocks, labled A-D:

On Monday the backup runs and creates the first VBK with the first version of each block:
Mon VBK = A1 B1 C1 D1

On Tuesday the backup runs and creates an VIB, but only block A has changed so the VIB consist only of the new block:

Tue VIB = A2

But the job was configured to create a synthetic full so what the system does is create a VBK and "block clones" the most recent version of this block into the VBK so you end up with a VBK like this:

Tues VBK = A2 B1 C1 D1

Note that this new VBK is simply referencing the same blocks as in the Monday VBK and the Tuesday VIB. Once the synthetic full is built, the system deletes the Tuesday VIB file, but that doesn't delete block A2 because the synthetically built VBK is still referencing block A2. Note that both the VIB and the VBK reference the very same block, so deleting the file freed no space from the actual disk, the exact same content remain on disk.

This continues for Wed-Friday, with different blocks changing each day:

Wed VBK = A3 B1 C1 D2
Thu VBK = A4 B1 C2 D2
Fri VBK = A5 B1 C2 D3

So now lets imagine it's time to delete the VBK from Monday because I've hit my retention. The system delete the Monday VBK, which contained blocks A1, B1, C1, D1, so what can actually be deleted? The Tuesday VBK still references blocks B1, C1, and D1, so those blocks cannot be freed, regardless of the fact that the file has been deleted. Only block A1 can actually be freed, because the other backup files still depend on some of those blocks.

In just this super simple example there are 14 block references which have to be tracked and processed by ReFS during delete. On the other hand, imagine if this was just regular incremental:

Mon VBK = A1 B1 C1 D1
Tue VIB = A2
Wed VIB = A3 D2
Thu VIB = A4 C2
Fri VIB = A5 D3

Note that this method has exactly the same blocks on disk, but exactly zero cloned blocks for the same amount of space. When retention hits on Friday the only thing that happens is that block A2 from the Tues VIB is cloned into the VBK, and then the VIB is deleted, much lighter and faster than having a bunch of blocks cloned to create a synthetic full. This is why I believe that we don't see the problem with customers that are running forever forward or reverse incremental modes, at least so far I haven't found one.

So assuming all of that made any sense at all, lets apply some scale to it! Let's assume you have a 5TB backup file. Assuming you are using normal Veeam settings for storage optimization (local block size and optimal compression), the average Veeam block is about 512KB (assuming 2:1 compression, could be slightly smaller if you get better compression). That means a 5TB backup file contains about 10,000,000 blocks! If you have 100 VBK files all made with block clone, and assuming your change rates are similar to observed averages to this point, probably 60% of your blocks are cloned across all 100 VBK files. Imagine the accounting that ReFS has to do to update it's block reference counts for all of those blocks.

On the other hand, if you ran weekly synthetic fulls, you'd use exactly the same amount of space but, at most, you'd have 15 VBKs on disk with cloned blocks, drastically reducing the reference block count, also, since it's only weekly, the amount of blocks cloned between files would be less as well because a lot more blocks change during a week than during a day.

Note that I'm not saying 100% that this would eliminate the issue in your case, obviously we have people even with weekly synthetic fulls that have the problem, but it's certainly providing additional stress to ReFS having to keep up with the extra accounting for so many block cloned files.
tsightler
Veeam Software
 
Posts: 4801
Liked: 1759 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Next

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: fika, Google [Bot], Yahoo [Bot] and 49 guests