Veeam has put me in a difficult situation...

CelticDubstep · Post by **CelticDubstep** » Aug 30, 2022 6:54 pm this post

This isn't directly related to Hyper-V, but more of a question on how to handle this situation Veeam has put me in. I setup Veeam B&R CE back in 2019 which has worked flawlessly and allowed us to restore some files that otherwise wouldn't have been possible. Since 2019, our file server has been a Hyper-V VM on Windows Server 2019. The virtual server has two virtual disks (as does the Hyper-V Host). The issue is with data VHDX where all the files are stored (the other VHDX is used for the OS). The Host has 4x 8TB Drives in RAID10 for 16TB usable storage. I created an expanding VHDX with a limit of 12 TB so it would give us room to grow. We're now at 5.2 TB of actual data.

However, this is where the issue is. Veeam had a backup fail in June 2020 so now we're in a situation of a parent/child setup. We have the main VHDX that is over 7 TB that hasn't been modified since June 2020. We also have a AVHDX that has a current modified time/date and is actively growing, approaching 3 TB. So, the used space on the host data drive is 9.87 TB with a free space of 4.67 TB (of 14.5 TB due to windows reporting 1024 instead of 1000).

I cannot merge these two files as there is not enough free space and the VHDX/AVHDX files are using double of what we're actually using. I need to decommission this host and move this VM to another host, but because of this issue, I'm starting to question if my file server should even been in Hyper-V. It's 5.2 TB with nearly 10 million files. I have a lot of server hardware that is under utilized right now and am thinking that it might be best for me to simply make a physical server a file server instead of virtualizing it so that I won't run into issues such as this with VHDX/AVHDX and Veeam Backups Failing. It isn't a big deal for any of my other VM's as they are small and can be merged with no issues, plus are on much faster 10K & 15K RPM SAS drives.

Am I better off simply going the physical server route? I have other servers of the same generation & RAID Controllers so if something should fail, I can simply move the drives over to another server and import the array.

Post by **Mildur** » Aug 30, 2022 8:04 pm this post

Hi

At my last work place before veeam, I saw customers with 15TB Fileservers and they didn‘t had checkpoint issues.
There must be a reason why the checkpoint wasn‘t deleted back in June 2020 and should be analyzed (by Veeam or Microsoft support) if that happens often to all vms.

Let see what others are thinking about fileserver. Physical or virtual.

Thanks
Fabian

CelticDubstep · Post by **CelticDubstep** » Aug 30, 2022 8:35 pm this post

Well, I just looked at the time/date of the snapshot which was 6/10/2020 @ 8:10 PM which happens to be a Wednesday. Every Wednesday evening (8-10 I believe) used to be my routine IT Maintenance Window. I had Veeam setup to do nightly incremental backups at 8 PM each night. I'm a one man shop so I don't have tickets, notes, etc to verify I was actually performing work that evening, but my guess is that I was and wasn't aware or forgot that the backups ran during that window. My guess is that I rebooted the host, or the File Server VM, rebooting network equipment (firmware updates, etc) and it caused the backup to get stuck like this. It had to be a perfect storm because I setup a lab to mirror our office setup and no matter what I tried... powering off (not shutting down) in the middle of a backup, etc... I could not reproduce this issue and I spent at least 2 hours trying. It had to be a "perfect storm".

I'm still concerned about the VHD/AVHDX sizes however because it takes over 26 hours to do a active full backup at a processing rate of 46 MB/s on a 10 Gigabit Network. It shows VM Size at 12.7 TB with 8.6 TB used, which isn't true. Veeam says it processed 8.6 TB, but only read 4.2 TB, so I don't know. The real kicker? Our remote office has ancient servers (13+ years old), the Veeam Backup Server/Proxy only has 4 GB RAM, 6 CPU Cores (on a 13+ year old CPU mind you) and only have a 1 Gigabit Network... Processing Rate? 117 MB/s. Backup completed in just over 3 hours (much less data, but still several TB's). *shrugs* No clue why the processing rate is so awful at the HQ, unless it's due to the amount of files or something else.

Post by **Mildur** » Aug 30, 2022 10:19 pm this post

I'm still concerned about the VHD/AVHDX sizes however because it takes over 26 hours to do a active full backup at a processing rate of 46 MB/s on a 10 Gigabit Network. It shows VM Size at 12.7 TB with 8.6 TB used, which isn't true. Veeam says it processed 8.6 TB, but only read 4.2 TB, so I don't know.

What does the job log say about the bottleneck? 46MB/s doesn't sound good. Before deciding virtual or physical, better to get to know the bottleneck in your environment.

Processed is the used Disk Size of the VM. Do not forget to count also deleted files. They are also used blocks when you look at the vm from outside. You won't see space usage from deleted blocks in the Windows Explorer inside the vm.

Read is the content of the VM which the Veeam proxy had to read and analyze. Normally when doing an incremental backup with change block tracking, processed data is really small (only changed blocks).
https://helpcenter.veeam.com/docs/backu ... ml?ver=110

*shrugs* No clue why the processing rate is so awful at the HQ, unless it's due to the amount of files or something else./quote]
If you do File Level Backup Jobs with the Veeam Agent, then yes, the amount of files will be the issue.
If you do VM Backup Jobs, then the files doesn't matter. Do you have activated guest file indexing?

CelticDubstep · Post by **CelticDubstep** » Aug 31, 2022 1:21 pm this post

This is what the log says from one back in May:

Total Size: 12.7 TB
Data Read: 4.2 TB
Transferred: 3.5 TB
Backup Size: 3.5 TB
Dedupe: 3.1x
Compression: 1.2x
Duration: 26:23:10
Processing Rate: 46 MB/s
Bottleneck: Proxy
Load: Source 82% > Proxy 91% > Network 61% > Target 88%

BackupSlacker · Aug 31, 2022 3:29 pm

Why not just treat the virtual server as a physical server as far as backups go? (utilize an agent instead of hyper-v). I wouldn't give up the benefits of virtual because of this one issue. Without looking in to it im sure you could also create a simple powershell script that looks at all your VM's checkpoints and emails you when one has a checkpoint older than 1 month.

Post by **vmtech123** » Aug 31, 2022 7:28 pm this post

I have 30TB and 40TB servers I am backing up without issue.. Disk is disk... Active full takes a bit, but then just use forever forward. You said yourself it isn't very active.

Your speeds are quite slow, so to me, that seems like an infrastructure or config issue.

CelticDubstep · Aug 31, 2022 8:00 pm

I don't think it's an infrastructure issue as all servers connect directly to our core 10 Gigabit Switch and copying raw files from server to server is limited by the disk IO, normally around 300+ MB/s if it's a RAID10 Storage Array. All network switches spread across 3 racks all have 10 Gigabit Uplinks to the core 10 Gigabit Switch.

The only issue I can see is a configuration issue, such as some type of NIC Queue setting/QoS or whatever that I'm overlooking. This office is very basic. There are no vLAN's or anything of that nature. The server hardware is by no means fast. The best we have is 15K RPM SAS drive's for the OS's, with storage drives being 7200 RPM SAS. However, the HQ only has around 22 employees and the branch office has like 6, so nothing is under heavy load.

Veeam in our branch office is running as a VM (4GB RAM, 6 vCPU's) on a 13+ year old server with X5650 Xeon's and gets better processing rates, and that server has several other VM's... NVR, Primary Virtual Router, & VPN Server.

Something is a miss somewhere for sure.

Moopere · Post by **Moopere** » Sep 05, 2022 2:25 am this post

The AVHDX is a differencing disk and will make your production server IO unbelievably horrible. I can't say how it would affect Veeam backups but it certainly won't make them faster.

I'm aware of the problem that the OP states with there now not being enough host disk space left to merge the AVHDX, I've hit this myself before. The way I've gotten around that is to create another physical hyper-v host with sufficient disk to house the VM in question. If the alternate host can have enough disk to also do the merge then more's the better.

Then one of the following:

- Move the VM, using Hyper-v's move functionality, to the new host. Merge the AVDHX on the new host if there is sufficient space to do so, or,
- Hyper-v replicate the problematic VM over to the new host, reverse the replication direction and merge the VM (again, if there is space). In some complex environments 'Move' will fail whereas replica will succeed, or,
- Backup the VM and restore it on the new host. I'm pretty sure that a full VM backup doesn't retain the subtlety of the differencing disk. I think it just backs up the running VM pulling in the data from the differencing disk as it goes. Restoring this backup should give you a standard VM without differencing.

gosterm · Post by **gosterm** » Sep 05, 2022 4:20 am this post

Its more of an hyperv issue. This happends happens on all hyperv since 2012. So I have to shutdown the VM and thats when the merge happens automaticaly.

dwj7738 · Sep 05, 2022 5:26 am

why you let this snapshot exist for years is up to you to look in the mirror.
one can keep the vm's up and running if on the host you run
get-vm | remove-vmsnapshot

GabesVirtualWorld · Sep 05, 2022 5:36 am

We've had this happen with Hyper-V as well, that a snapshot was still running and SCVMM and Failover Cluster Manager didn't show a snapshot, but the Hyper-V manager did. Discovered this some day when a VM grew so big, the CSV would fill up. We then noticed about a 100 VMs with this similar issue. It were all VMs from a new customer that we just imported into our DC. We now just scan CSV volumes for the presence of AVHDX files every day by script.

But luckily management has finally decided to move away from Hyper-V 2019, back to VMware. We've had so many issues that even our MS Premier Support couldn't explain other then "update to latest updates and maybe it is now fixed". Hyper-V 2012R2 was a pain, moved to 2016 which was quite good and gave me the feeling we had a smooth running environment. Now customers want Windows 2022 Guest VMs and therefore Hyper-V 2019 is needed, but it turns out 2019 is a real pain. Performance is worse, more unexplained issues and CBT problem (see other thread). We're going back to VMware

JeWe · Post by **JeWe** » Sep 05, 2022 6:26 am this post

Forget about the snapshots. Export the VM to the target server, import it again and you're good to go.

c.haydock · Post by **c.haydock** » Sep 05, 2022 6:47 am this post

I realize this doesn't help your present situation... But, something for your consideration going forward to prevent this from happening again... Install+Use Veeam One to monitor your infrastructure. One of the alarms that it would have given you is having an old checkpoint. At my previous job we would get these all the time because one of the admins would take a snapshot of a VM prior to doing application updates. Which is not a bad practice... except when you forget to delete the checkpoint when you are done and everything looks good!

But when that happend... Veeam One for the win... because within a week we would get a notification that we had a VM with an old checkpoint and we could delete it and merge without any issues.

RGijsen · Post by **RGijsen** » Sep 05, 2022 7:26 am this post

CelticDubstep wrote: ↑Aug 30, 2022 6:54 pm The Host has 4x 8TB Drives in RAID10 for 16TB usable storage. I created an expanding VHDX with a limit of 12 TB so it would give us room to grow. We're now at 5.2 TB of actual data.

I'm still concerned about the VHD/AVHDX sizes however because it takes over 26 hours to do a active full backup at a processing rate of 46 MB/s on a 10 Gigabit Network. It shows VM Size at 12.7 TB with 8.6 TB used, which isn't true. Veeam says it processed 8.6 TB, but only read 4.2 TB, so I don't know.

And 1+1=2. Sorry to be direct, but these 8TB drives are most likely your performance issue. I'm wondering how you have 'invested' in 10Gbps network, but run on such measly storage. In a RAID10 setup, you could maybe get 250-300MB/sec from those drives when they do sequential access. But since you are running from a snapshot for that long, doing sequential access from within the guest OS, results in EXTREME random IO from your physical disks, and then 46MB/sec isn't even that bad considering you only have 2 data disks. Key thing here is to get that snapshot merged. Multiple ways to do so, but that's most probably the biggest issue here. But first make copies (or another backup) of it. You can manually merge the snapshots using powershell (or even the Hyper-V management console).
I'd not waste time with physical hardware anymore unless you have real good reasons to. A file server isn't in my book. Even if you use a (guest) cluster on your fileserver (opposed to DFS), which makes VM backup impossible, I'd rather still use VMs and use the Veeam Agent than fiddle around with physical hosts. The advantage of being able to move your VMs around your hosts, without interfering users whenever a hardware issue is on the horizon, is a big, big pro for us. Also, no fiddling around with hardware specific drivers (unless you go the SR-IOV route) makes life so much more simple.

Anyway, what are your options to move that VM to a host with more diskpace, or adding temp diskspace to the current host? What error do you get (eventlog?) when you try to merge the snapshot?

Markus M. · Post by **Markus M.** » Sep 05, 2022 8:03 am this post

I am running Hyper-V 2019 Hosts for more than 3 years now, clustered and standalone. Cannot confirm GabesVirtualWorld problems, but I agree seeing other reasons to think about changing to VMWare (but not related to the content of this post)
There were some issue with merging snapshots in Hyper-V in the past, but as far as I know they have been fixed when Hyper-V AND Veeam are running in current patch level.
I remember I had a VM with more than 350 snapshots and that wasn't discovered for long time. But with help from Veeam support we were able to fix that (Case: 04513524) - solution was to merge the snapshot into parent vhdx via script.
Regarding long lasting (and undiscovered) snapshots / avhdx:
I am running a simple PS script as scheduled task every 8 hours to avoid such issues, something like this:
Get-ClusterGroup -Cluster "<Name>" | ? {($_.GroupType –eq 'VirtualMachine')} | Get-VM | Get-VMSnapshot| select -Property VMName | ft -AutoSize -HideTableHeaders
and furthermore, look for avhdx files: get-childitem -Recurse -Filter *.avhdx
and sending this via mail to people involved. Not very sophisticated, but fitting my needs.
And lastly: I agree to RGijsen opinion and wouldn't run file server on phys. hardware, as a VM you gain plenty of pros in handling the workload.

Post by **kevin.boddy** » Sep 05, 2022 8:08 am this post

c.haydock wrote: ↑Sep 05, 2022 6:47 am I realize this doesn't help your present situation... But, something for your consideration going forward to prevent this from happening again... Install+Use Veeam One to monitor your infrastructure. One of the alarms that it would have given you is having an old checkpoint. At my previous job we would get these all the time because one of the admins would take a snapshot of a VM prior to doing application updates. Which is not a bad practice... except when you forget to delete the checkpoint when you are done and everything looks good! But when that happend... Veeam One for the win... because within a week we would get a notification that we had a VM with an old checkpoint and we could delete it and merge without any issues.

I second that. Veeam One is your best tool for monitoring open snapshots.
We have seen snapshot/checkpoint issues on Hyper-V and VMware. Veeam has Snapshot Hunter for VMware but nothing similar for Hyper-V.

Normally we'll get an orphaned production checkpoint left behind if there is a Hyper-V host crash during a backup window.
Other times I've seen it where there has been a brief CSV volume outage on a failover cluster.

The other thing is you may not need the full AHVDX file size as free space because during the merge you'll find the parent VHDX will not grow as much as you think as a lot of the original data will be replaced. The problem is you'll only know when you run the merge.

c.haydock · Post by **c.haydock** » Sep 06, 2022 2:23 am this post

RGijsen wrote: ↑Sep 05, 2022 7:26 am Anyway, what are your options to move that VM to a host with more diskpace, or adding temp diskspace to the current host? What error do you get (eventlog?) when you try to merge the snapshot?

Along those lines... I only suggest this because it's an option for people who are really desperate and can't afford to add another hardware RAID controller or lack the extra storage bays for additional internal drives... But, something I've done in desperate times to temporarily add storage to a server to handle tasks like this is to attach external drives via USB and build a Storage Spaces storage pool out of said disks. I know... I know... to say it's "sketchy" is an understatement. But, as stated... if you are desperate, you do what needs to be done, not what best practices would say you should do. Anyway, it is about as far from fast as you can imagine... but it does the trick. The only thing that I've found to be a bit wishy-washy in the past is that not all USB to SATA/SAS adapters will present the drive to the OS in a manner that will be allowed to add it to a storage pool. I've honestly never looked too close as to what the limitation is that prevented me from doing so... chipset features, drivers, something else??? I just know I have some USB adapters that will work with Storage Spaces... and some that don't.

All that said, as others have noted, make sure your backup is tip-top before trying that stunt. It's not for the faint of heart. Also... speaking of backups... if you are confident in your backups and can afford the down time... Just completely blow away your current VM and do a restore from backups!

The Community Edition of VBR can do an Instant VM Restore... then migrate to your production server.

Sep 06, 2022 2:06 pm

Everyone,

The snapshot/checkpoint wars rage on here as well. I've had issues with discovering a Hyper-V host running out of disk space because of a snapshot/checkpoint on a single VM just hogging all the space. This seems to be caused by the following:

* Power went out or updates caused server reboot at the time backups were running. Veeam sent HyperV the signal to create the checkpoint but because the backup never finished Veeam never got the chance to send the all clear to merge/delete the checkpoint.

* A tech created a checkpoint to install some software as a safety precaution but then forgot to delete it.

The only way around the scourge of checkpoints I've seen is to either employ VeeamOne or have a PowerShell script that runs on every machine that merges any checkpoints it finds once a week/day/whatever.

For those of you who are allowing the VMWare vs HyperV wars to enter the conversation, you aren't doing anyone any favors. If the power goes out or backups fail for whatever reason you'll have the same errant snapshot problem. If a tech creates a snapshot then forgets about it, same problem. But for what it's worth and to just even the scales a bit.....

You'll never get me on VMWare. I cannot justify the cost to my customers, HyperV does a good-to-great job, VMWare has it's own fair share of problems, and it's performance sucks with Remote Desktop Servers. So there.

JamesNT

R&D Forums

Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Re: Veeam has put me in a difficult situation...

Who is online