Host-based backup of Microsoft Hyper-V VMs.
Post Reply
jotge
Influencer
Posts: 24
Liked: 2 times
Joined: May 20, 2019 11:44 am
Full Name: Jan Groschopp
Location: Deutschland
Contact:

Our known v12.1 problems in a large Hyper-V environment

Post by jotge »

Hello,

unfortunately we have some problems after the upgrade to version v12.1 and wanted to share our experiences here. Perhaps there are other environments that are experiencing similar problems.

At first our environment...

to be backed up:
- 16 Hyper-V clusters, 105 Hyper-V hosts in total, 1400 VMs in total
- 7x SQL Failover Cluster (Windows Agents)
- 29x servers (Windows Agents)

used for the backup:
- 6x physical repository servers (4x of which are also tape servers)
- 1x virtual Veeam backup server
- 1x virtual Veeam SQL database server (MS SQL Server 2019 Standard)
- 1x virtual Veeam Enterprise Manager Server
- 1x virtual Veeam Veeam ONE Server
- 1x Tape Library with 4 FC connectet LTO-8 drives (M8 labeld tapes)
- 122 backup jobs (95x HyperV, 27x Windows)
- 95 transaction log backup jobs
- 4 backup copy jobs
- 4 backup to tape jobs (source: backup copy)

After upgrading our environment to Veeam v12.1, we have the following problems that did not occur before:

- A single backup job starts sporadically without any processing taking place. However, the process consumes a lot of CPU resources on the backup server (case # 07248027)
The high utilization is also noticeable when operating the backup server. The backup job must be stopped manually in the Windows Process Manager, it cannot be stopped in the VBR Console. These are different jobs, it is not always the same one.

- SQL Restore in a Failover Cluster Environment fails (case # 07243819)
When attempting to restore a database in the failover cluster environment, regardless of whether it is in the original location or redirected, the following error message appears "The specified drive letter is incorrect.". The process cannot be continued.

- Veeam ONE Missing data for the performance counters (case # 07137635)
In addition to a "normal" display of the performance counters for many objects, sometimes no data is displayed at all or data is only displayed as a "flat line". This results in different displays over time. The phenomenon can occur permanently from a certain point in time or be limited in time. Some counters are not displayed at all for certain objects, but are displayed for other objects in the same category.

- We also notice that our backup to tape jobs get stuck and just don't do anything anymore. (no Veeam case open yet, but very likely the next one)
The backup to tape jobs use the backup copy jobs as a source. There is a 1:1 relationship.
If the backup to tape jobs "hang", this triggers a chain reaction in which the backup copy jobs and some backup jobs also hang. Presumably a resource problem. The problem is usually solved by simply terminating the backup to tape jobs.

Has anyone had similar experiences, perhaps in a similar infrastructure environment?

We are already working with Veeam support and hope to have a solution soon, which I would also share here.


Have a nice day

Jan
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Hi Jan,

Thank you for the detailed write up, and sorry to hear that these unexpected behaviors are causing headaches.

For the first two cases, I can see Support needs a bit more time to review the logs and focus the plan of action further, so your patience is much appreciated; neither behavior immediately comes to mind as recognized/wide spread, so let's allow the Support Engineers some time to review the logs.

For the 3rd case (07137635), I can see this is with our Advanced Support Team and they have escalated it internally, so looks like we've got the appropriate resources assigned, so hopefully we'll see an update soon.

The tape job issue also is not expected, and I recommend another case for that as well; can you confirm though, the tape job starts (shows as running) but does not appear to progress within the UI, or it starts processing and hangs at some point? Does it at all seem to align with the high CPU usage during other jobs you see by chance?
David Domask | Product Management: Principal Analyst
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

Just logged in to tell you that the locking up is also affecting large portion of our customers as well. We have cases opened since february with no resolution. People on reddit are having similoar experience as you have.
jotge
Influencer
Posts: 24
Liked: 2 times
Joined: May 20, 2019 11:44 am
Full Name: Jan Groschopp
Location: Deutschland
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by jotge »

Hello Davis,

thanks for your quick response.

I hope we don't experience the same situation as SnakeSK, at least in the case of 07137635 it is developing in that direction, because this case has been open for quite a long time without a solution. The fact is that we use this tool really intensively, not only the backup administrators but also the application owners, so this is not a really nice situation.

To your question. The tape job starts and at some point this dropout occurs. So it's not that no data is being copied at all, it seems to stop sporadically. I can't say whether there is a connection with increased CPU utilization, as the job runs for a very long time and I don't observe it at night. We could of course use Veeam ONE and use the performance counters for analysis, but unfortunately the tool is currently only of limited use (see Case # 07137635).

Regards

Jan
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Hi Jan,

I think there's no need to monitor the CPU usage just yet like that for the tape job, and Support will be able to start the check with just the job logs; I know you have a few cases open right now, but it's best that the Support team take a look on the Tape job behavior also.

As for 07137635, it looks like the issue was escalated to Veeam RND, so let's wait to see the results of the investigation there; your patience is much appreciated, and the right resources are aligned on the case to understand the behavior more clearly.

@SnakeSK, can you DM me the case numbers or post them here? I'd like to review the previous cases a bit.
David Domask | Product Management: Principal Analyst
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

07222212 and 07156879 and 07078243
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Hi SnakeSk,

Thanks for sharing the cases -- I can see the first two were reported to have been solved after some changes, but the issue returned which brought you to the current case 07222212.

The case is currently with our Advanced Technical Support team, and as I get it, the plan is to perform a debugging dump of the Veeam.Backup.Manager process to understand the hangs a bit more. I know you've had a few cases you've been working on this over and a few "false victories", and appreciate your continued patience and cooperation. The engineer will need a bit of time to review the newest provided information, so please continue working with the Support Team on this one and let's see the results of the log/dump review.
David Domask | Product Management: Principal Analyst
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK » 1 person likes this post

No the current case is 07222212, dumps have been provided, so we will see. The problems started for numerous customers with V12.1, so it´s definetly a version specific, not environment specific
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Ah, sorry about that, just a copy/paste error. I'll edit my post to reflect it correctly, and still let's allow the engineer time to review the provided dumps.
David Domask | Product Management: Principal Analyst
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

jotge wrote: May 03, 2024 12:29 pm
- Veeam ONE Missing data for the performance counters (case # 07137635)
In addition to a "normal" display of the performance counters for many objects, sometimes no data is displayed at all or data is only displayed as a "flat line". This results in different displays over time. The phenomenon can occur permanently from a certain point in time or be limited in time. Some counters are not displayed at all for certain objects, but are displayed for other objects in the same category.
Can you try this? This resolved our VOne performance monitoring problems
For the test purpose, please change the performance data collection method in Veeam ONE from "perfmon" to WMI:





- On the Veeam ONE server machine, please open "regedit" and go to HKEY_LOCAL_MACHINESOFTWAREVeeamVeeam ONE MonitorService


- Create or modify the following entry:





Name: HyperVCollectionType


Type: REG_DWORD


Value Data: 1





- Restart the "VeeamDCS" service.





Please wait for 20–30 minutes and check if the performance data is collected for VMs.
We currently have VOne case opened since december (half a year nearly) that when utilizing perfmon some data were straight up missing or not collected at all, even after reboot, service could not be stopped etc - case nr 07051424

The reg key above helped us big time.
jotge
Influencer
Posts: 24
Liked: 2 times
Joined: May 20, 2019 11:44 am
Full Name: Jan Groschopp
Location: Deutschland
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by jotge »

:D Hi SnakeSK,

thanks for this tip. :) I have implemented it and, unlike all previous "attempts", Veeam ONE has been collecting data again since then. :D

However, we still have to observe this, because up to now we have always had temporary phases in which data was collected. What makes me a little optimistic, however, is the fact that disk data is being collected again, because it was no longer collected at all from the time of the upgrade.

I would also ask Veeam Support about this setting and whether there are any side effects.

Thank you and best regards

Jan
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

We are running this for 2 weeks with no problems, but properly test it, I really dont know what went wrong with 12.1 release, today sent another batch of dumps and logs on behalf of another customer, backup job locked for 12 hours.

//edit: I also had phases like you did, sometimes VOne monitored for several days, then only disk and networks metrics were reported, sometimes nothing at all for several hours, even VBR job monitoring did not work. After implementing this it has been a lot better.
tomtom94
Influencer
Posts: 11
Liked: never
Joined: Dec 02, 2022 4:53 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by tomtom94 »

Hello!

We are also seeing a similar problem like case # 07248027 or the "backup to tape problem" of the original poster at one of our customers after upgrading to 12.1 a few month ago but on a much smaller environment.
There are only 2 Hyper-V servers, a replication job and a few backup / backup copy jobs.

The replication job sometimes starts and doing just plain nothing.
This blocks the normal backup jobs and due the lack of a timeout option it runs endless.
Therefore also no error messages are sent via mail, lulling you into a false sense of security.

But however unlike to the original poster i can stop the replication job regularly without killing the task.
Stopping the job throws the following error:
Resource not ready: backup proxy
Processing finished with errors at xx.xx.2024 xx:xx:xx

I helped myself by adding a backup window that stops the replication job before the regular backup jobs.

This happens totally random and i was not able to confirm if the job takes up a lot of CPU time.

No case open yet.

Best regards

Tom
Binje07
Novice
Posts: 4
Liked: 2 times
Joined: May 13, 2024 8:58 am
Full Name: User Since Veeam 7.5 / VBR Cloud Connect
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by Binje07 »

Hi everyone.

Since we installed the 12.1 version in January we have a huge amount of errors with our Hyperv Backup and Replication, on Hyperv clusters and on standalone.
The main error is jobs getting stucked for no reason.

ReplicationJob starts an get stuck while waiting for the proxy, even on an on-host proxy where nothing else is running.
If the replication is stuck, the backupjob get stuck too. We launch replication every hour so we end-up creating a script which stops the replication if it's still runnning after 45 minutes.
Task scheduler with launching Powershell (C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe) and the argument pointing to the PS1 file (-File "C:\Script\yourscript.ps1") containing (Get-VBRJob -Name "YOURJOBNAME" | Stop-VBRJob).

We even have some backup job stuck (waiting for infrastructure/not obtaining a proxy), it sometimes forces us to reboot the whole VBR Server.

After working with the support team during more than one month, they give me a private patch to correct this behaviour (with the creation of a registry key too).
Seems like it's working but I applied it last week on my critical VBR so I prefer be careful for the moment. I've done the other ones this morning because I got another job stuck and I'm fedup having to connect on week end to check if it's allright.

I add that I need to insist to have a level2 engineer because the level one was a beginner and unable to have a pertinent analyse. The quality of support has drop down since a few months. I know it was a complex problem since the beginning and I lsot a lot of time sending logs.

My whole structure was impacted,3 differents VBR and 20 HyperV, no problem on VMWARE at all. We tried a lot of things and have done a lot of extracting logs before getting the right answer.

Here is the link to the patch:

[Moderator: removed]


Hope this is helping.
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Hi Binje07,

Thank you for sharing your experience and the solution -- I've removed the hotfix information from your post as hotfixes must only be utilized after the situation is reviewed by Veeam Support.

I'm glad to hear that you were able to get a resolution, though sorry to hear that the case was not to your satisfaction -- can you share the case number for review?
David Domask | Product Management: Principal Analyst
Binje07
Novice
Posts: 4
Liked: 2 times
Joined: May 13, 2024 8:58 am
Full Name: User Since Veeam 7.5 / VBR Cloud Connect
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by Binje07 »

I perfectly understand your point of viewe but for exemple tomtom94 is concerned by the same anomaly so if they don't want to wait three month for a patch. That's why I recommended to make a backup. This patch should be share if working.
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Aha, understood, but there is a logic to having the environment reviewed before applying the hotfixes :) So if possible, please share your case number as requested; it can be a an additional point of information during research to confirm from the logging on the behavior, and then act as the situation dictates.
David Domask | Product Management: Principal Analyst
Binje07
Novice
Posts: 4
Liked: 2 times
Joined: May 13, 2024 8:58 am
Full Name: User Since Veeam 7.5 / VBR Cloud Connect
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by Binje07 » 1 person likes this post

Case n°07164826 but I just got a replication stuck so seems it's not solved. :evil:
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

We had another deadlock throughout the weekend. Today another one at different customer. Both replicas.

These cases are dating back to 12.1 december release where we had the first support case, its half a year and there js still no resolution?

Wouldnt this be a good time to do some code regression to bring reliability to acceptable levels?

Thank you
jotge
Influencer
Posts: 24
Liked: 2 times
Joined: May 20, 2019 11:44 am
Full Name: Jan Groschopp
Location: Deutschland
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by jotge » 1 person likes this post

Short update from me "Veeam ONE Missing data for the performance counters (case # 07137635)"

So far, all performance data has been collected. As we have had phases in the past where data was collected over 23 days, I still need to monitor it to be sure. But as I mentioned, now that the disk data is being collected - which was not the case before - I'm still optimistic.

The other cases are still open, so far without a solution. Today I have a WebEx session with support about the SQL failover cluster restore.

Have a nice Day!
david.domask
Veeam Software
Posts: 1354
Liked: 352 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by david.domask »

Hi Jan,

Glad to hear there was some progress here; indeed, monitoring is a good idea. Checking the case, I can see that there does seem to be some delay on the most recent update; I will ask the Support Team to respond on the most recent update from the internal investigation and on the new information you've shared.

(Hint: If you every do encounter issues or unexpected delays on a case, use the Talk to a Manager button to reach out to Veeam Support Management and explain the concerns regarding the case.)
David Domask | Product Management: Principal Analyst
Binje07
Novice
Posts: 4
Liked: 2 times
Joined: May 13, 2024 8:58 am
Full Name: User Since Veeam 7.5 / VBR Cloud Connect
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by Binje07 » 1 person likes this post

This morning, another patch with another key in regedti to try solving the randomly stucked replicas or backup jobs
jotge
Influencer
Posts: 24
Liked: 2 times
Joined: May 20, 2019 11:44 am
Full Name: Jan Groschopp
Location: Deutschland
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by jotge » 1 person likes this post

Update for SQL Restore in a Failover Cluster Environment fails (case # 07243819)

Yesterday we found out from veeam support that the determination of the path of the SQL instance on the cluster node using the SQL statement

Code: Select all

SELECT path_name FROM sys.dm_io_cluster_valid_path_names;
fails. The query also returns no result when called in SQL Management Studio.

So it does not seem to be a Veeam problem but a local SQL failover cluster problem.
SnakeSK
Service Provider
Posts: 57
Liked: 9 times
Joined: Feb 09, 2019 5:06 pm
Contact:

Re: Our known v12.1 problems in a large Hyper-V environment

Post by SnakeSK »

Binje07 wrote: May 15, 2024 10:20 am This morning, another patch with another key in regedti to try solving the randomly stucked replicas or backup jobs
Nice, I got generic reply that it is being investigated. No regkeys, no patch
Post Reply

Who is online

Users browsing this forum: No registered users and 12 guests