Comprehensive data protection for all workloads
Post Reply
mephisto
Expert
Posts: 121
Liked: 7 times
Joined: Nov 07, 2012 6:49 pm
Full Name: Mephisto poa
Contact:

Backup jobs from VMs stored in ZFS and ARC

Post by mephisto »

Hi there,

I'm wondering how backups could interfere with ARC on ZFS, consider the following:
  • ESXi Hypervisor
  • FreeBSD, Debian or Freenas as the storage
  • iSCSI from Hypervisor to Storage
How would ARC behave when it has cached bits from the running VMs when a backup job runs? Would the backup job push data our of ARC due to data being read from disk that was "cold", but due to the backup job needs to be read (active full backup)? Does if differ much during a incremental backup using CBT?

I'm wondering if there is some sort of API that would allow Veeam to backup data stored on a node with ZFS that would avoid hot data to be pushed out of ARC.

Also, I wonder if ARC and L2ARC would behave in the same way considering the situation mentioned above?

Thanks!
Andreas Neufert
VP, Product Management
Posts: 6747
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Backup jobs from VMs stored in ZFS and ARC

Post by Andreas Neufert »

ARC is used for read optimization (RAM).
L2ARC is usually a MLC SSD or similar that work as write cache.

So for regular backups you will not benefit from ARC. Potentially there is a smaller performance boost for Synthetic Full or Merge operations as ARC is used as read ahead cache.

As you transport a lot of data during backup that is usually much bigger than the L2ARC (and ARC) cache, in the end the hard disk subsystem are the bottleneck as you likely fill up 100% of the L2ARC cache. Potentially theses caches help to speed up some of the operations. It would make sense to do a performance test with real backup data.

It is as well important that when you do benchmarks, to test with large data. Usually benchmarks just transport some GB of data which would be served by the ARC/L2ARC cache completely. So you can not use them to compare it for real backup workloads.
mephisto
Expert
Posts: 121
Liked: 7 times
Joined: Nov 07, 2012 6:49 pm
Full Name: Mephisto poa
Contact:

Re: Backup jobs from VMs stored in ZFS and ARC

Post by mephisto »

Hi Andreas,

I think I may have not made my question clear, will a backup using Veeam push hot data out of ARC? WHat I'm trying to establish is, will a backup using Veeam make the hot data to be pushed out as new data is written to ARC from storage as the VM bits are read? I'm worried this can lead to bad performance as the backup will screw up the ARC and then my production VMs instead of feeding data from ARC will need to hit the underlying storage.
Andreas Neufert
VP, Product Management
Posts: 6747
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Backup jobs from VMs stored in ZFS and ARC

Post by Andreas Neufert »

Ah you have used ARC on the primary storage. Sorry I though you want to use it at the target.

For the initial full I guess the data will be mixed new in the ARC cache as you read 100% of existing data. Hot data change on access there I guess.

For incremental processing it depends. If you cache is large enough to hold the daily changes from the VMs, then (as we read only incremental forever) we will read the changes from ARC cache. If the ARC cache is not large enough we will read non "hot" data which then become "hot" because we read from there. Anyway this is not that bad as it is still actual data that has someone written since the last backup.
mephisto
Expert
Posts: 121
Liked: 7 times
Joined: Nov 07, 2012 6:49 pm
Full Name: Mephisto poa
Contact:

Re: Backup jobs from VMs stored in ZFS and ARC

Post by mephisto »

Yeah sorry I think indeed that was not clear. I'm using physical storage boxes running W2016 and REFS with a HW raid controller as it seems a cheap and reliable storage target. The VMs are running from shared storage, a mix of freebsd and debian with zfs on top.

Thanks for clarifying that, actually researching a bit more it seems zfs now has a list of most frequent accessed files on top, so things that are often read have a higher priority to stay in ARC then things that are just read once, even if they are read afterwards.

It seems ARC has become a lot more competitive now, it even has compression.
Andreas Neufert
VP, Product Management
Posts: 6747
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Backup jobs from VMs stored in ZFS and ARC

Post by Andreas Neufert »

Regarding file access tracking... the "files" seen by the ZFS are the vmdk containers. And as they are accessed on any VM the logic is not really helpful I guess.
Post Reply

Who is online

Users browsing this forum: No registered users and 65 guests