Comprehensive data protection for all workloads
jgrinwis
Novice
Posts: 9
Liked: never
Joined: Feb 02, 2011 12:53 pm
Full Name: John Grinwis
Contact:

hotfix, performance degradation on vsphere 4

Post by jgrinwis »

One of our customers has been a happy veeam backup and replication customer for some time.
Prior to upgrading the cluster to vsphere 5, we've already installed the vsphere 5 hotfix for veeam.

But since the upgrade, backup performance has dropped:

pre-hotfix:
===
16 of 16 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 3,24 TB
Processed size: 3,24 TB
Processing rate: 189 MB/s
Start time: 19-10-2011 22:00:04
End time: 20-10-2011 2:59:20
Duration: 4:59:15
===

hotfix:
16 of 16 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 3,24 TB
Processed size: 3,24 TB
Processing rate: 78 MB/s
Start time: 30-10-2011 22:00:17
End time: 31-10-2011 10:05:51
Duration: 12:05:34
===

we're using hot backup mode with CBT enabled which has been working really well for some time now.
Is this a combination hotfix<=>vsphere 4 issue or will we get back to normal backup speeds when using hotfix<=>vsphere 5 again.

Regards,
John Grinwis
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Hi John, theoretically it's possible that the hotfix affects performance, as we pull data through VMware VDDK (vStorage API), and the hotfix brings the new major VDDK version (required for vSphere 5 support) which may have code changes impacting performance on vSphere 4. To confirm that the hotfix is indeed the reason, please run it for a few days to collect the sufficient amount of logs, then re-install 5.0.2 while pointing it to the existing database, and run the same jobs for a few more days. Then, send all logs to support for them to compare.

Generally speaking, with every release we do there is always a few customers claiming the "performance drop after installing the new release". We have almost used to it... upon investigation, most of the times the issue appears unrelated to the hotfix, but some other change in the environment. This is why we always want to see the logs instead of judging on observations.

Since the hotfix introduces signficant changes to the data moving engine (the new, major release VMware API library), there is always a chance that it indeed introduced the issue. Although we have not observed this in our labs for some reason.
jgrinwis
Novice
Posts: 9
Liked: never
Joined: Feb 02, 2011 12:53 pm
Full Name: John Grinwis
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jgrinwis »

Customer noticed it directly ater the upgrade. The night after the hotfix update, the backup took a lot longer then normal and the hot-fix was the only thing that changed.

Regards,
John
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Trust me, they all notice it "directly after upgrade" and the Veeam code is always "the only thing that changed" :) again, this is something that repeats with every release, yet so far the new code was never found to be the reason behind performance issue reported. Which is why we always prefer to compare the logs instead of just trusting their word.

Could be as simple as unusually big amount of changed data to process in some VMs, some network or target storage congestion, and so on - in other words, things that your customer can never know have happened. While there is always a chance of issue with the code (especially in this case - major change), it is still more likely that the issue is unrelated to the patch.

I've delete the logs from your post - kindly please avoid posting logs on forums, as requested when you click New Topic. Instead, please submit them to support directly (they will require full logs, not just snippets). Again, they would require full logs for a few days before and for a few days after upgrade. Thanks!
jgrinwis
Novice
Posts: 9
Liked: never
Joined: Feb 02, 2011 12:53 pm
Full Name: John Grinwis
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jgrinwis »

I don't have any veeam logs from before the upgrade anymore. Will discuss with the customer if they want to "unpatch" the system to get the backup up to speed again and then patch veeam after the vsphere upgrade next week.

The only thing I see is that a 500GB vm that took about an hour, suddenly takes more than 8 hours.
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

It's very easy to "unpatch", just reinstall 5.0.2 pointing to the existing database (takes less than 5 mins). Also, did you actually delete the "before upgrade" logs? Because they should be still sitting there hopefully (in log archives), if your customer has upgraded just a few days ago.
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

Hi,
I’m not here to blame Veeam but I’ll just let you know that I have upgraded my lab environment and has also seen a big increase in backup time.
Since I already have upgraded my lab host to vSphere 5 and my machines to HW level 8 I’m not able to go back so I’ll try reproduce the setup in another lab.
sstinner
Lurker
Posts: 1
Liked: never
Joined: Oct 14, 2010 1:01 pm
Full Name: Sean Stinner
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by sstinner »

I also had performance issues with Veeam once the Hotfix was applied. What I found was if I was backing up a single VM guest and it was on a single datastore the backup would run at the expected speed. As soon as I backup a VM guest that was located on more then one datastore or 2 or more VM guest that where on different datastores the backup would take alot longer and max the CPU on the Veeam backup server out.

Once I uninstalled Veeam and reinstalled using the existing DB the backups would complete in expected times and the CPU usage was back to normal.
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

Actually I also see big increase in CPU consumption. 100% on 2 vCPUs
I’ve just had another look at my job reports. Jobs are going from around 3-5min (reverse inc.) to about an hour. Doing that hour the CPU is on MAX (100%).
My Veeam Server is virtual so in VMware Performance tab I’m able to see older performance data (from before the hotfix) –I can see that before the hotfix the Veeam server didn’t nearly touch the CPU!
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

I will ask our QC to compare the performance on weaker backup servers, and see if there is any differences.

Based on the above information, I may take a guess that the new version of VMware vStorage API (VDDK 5.0) possibly has increased CPU usage due to changed processing logic. If backup server lacks CPU resources (with CPU sitting at 100%), then CPU becomes your bottleneck, which obviously reduces the backup performance. As immediate workaround, especially for those who clearly have CPU overload issue, I recommend CPU upgrade. With sufficient CPU resources, your backup performance should get back to "normal".

The other option would be to reduce the compression level in the job settings to "Low", which will free up significant CPU resources on your backup server.

Finally, you can always go back to B&R 5.0.2 code that uses VDDK 1.1.1, and wait until the next VDDK update before upgrading to vSphere 5. Of course, there is no guarantee that this will be "fixed" (as may be increased CPU usage is something that is expected and "normal" with VDDK going forward). We will, however, try to confirm performance differences between VDDK versions using weaker backup server, and if we can see them as well, we will submit the issue to VMware.

Unfortunately, the only way to backup vSphere 5 is using VDDK 5.0, so even if it indeed has CPU consumption issues, this is something everyone who needs vSphere 5 support would have to live with until the updated version of VDDK is available.

@Christian, please note that your backup server does not meet minimum system requirements in any case (as 4 vCPUs are required).
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

Gostev,
I’m well aware of that :-) - it’s only my lab server! – But based on what I see there I don’t feel ready for the hotfix in my production. – But anyway maybe I’ll try it production because I the ability to go back again. (In my lab I don’t because I already upgraded my host and VMs).
I have now added a third vCPU in my lab Veeam server – I will post back the job outcome tomorrow.
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

I will also make clear that the CPU utilization on the Veeam server didn’t go up a bit – I went from 5-10% doing job with identical settings / VMs to 100%.
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

Maybe not a problem on physical Veeam servers but it could be on virtuel ones.
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Yep, definitely worths investigating. By the way, are you using the "Virtual Appliance" processing mode on your virtual backup server, or some other mode? Please let me know, as this might be important.

There was one other thing I saw on the internet (unconfirmed by me personally at this point, but let me throw this in here). Basically, I've seen people reporting that ESXi 5 is almost twice as fast working in NBD mode (network backup). Theoretically, if this is true, you should be seeing a big jump in CPU usage on your Veeam server, because it can now pull the data twice as fast, and thus has twice more data to process per second (which will in turn increase CPU usage 2 times). However, I understand that for OP, the performance actually went down even on vSphere 4.
dkvello
Service Provider
Posts: 109
Liked: 14 times
Joined: Jan 01, 2006 1:01 am
Full Name: Dag Kvello
Location: Oslo, Norway
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by dkvello »

The question i too often pose to the customer is:
”What did You do just before You didn't do anything?"
jgrinwis
Novice
Posts: 9
Liked: never
Joined: Feb 02, 2011 12:53 pm
Full Name: John Grinwis
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jgrinwis »

we're running it as a virtual appliance and our quad CPU VM's is going to a 100% cpu load during the backup, used to stay below the 50%.
dedup enabled and compression set to the default setting of optimal, nothing changed there.
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Thank you John.

All, if you are affected by this issue, please do open a support case so that we could collect the required information and match the reports to specific Windows OS versions, configurations (x86/x64 OS and code), processing mode, vSphere version etc.
arthurp
Influencer
Posts: 23
Liked: never
Joined: Jan 11, 2010 9:18 pm
Full Name: Arthur Pizyo
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by arthurp »

My understanding is that quite a few customers (probably more than those who reported it on this forum) experienced "CPU 100%" situation. I think that the reason is that we mostly expected this patch to be purely technical - added long awaited support for vSphere 5, no other changes expected. In our case our pre-patch Veeam installation was modified to run against virtual hosts directly as it didn't support new vCenter version and we ran all jobs without problems, no schedule changes required, maybe they ran just a bit slower, but no problems worth mentioning. Based on that experience we didn't really expect any problems as Veeam is usually extremely diligent with upgrades.

Here comes the patch and everything changes. Veeam server (4 CPU 4 GB RAM virtual machine) that satisfied all published requirements is no longer sufficient. Our near-CDP schedule miserably fails as jobs overlap and push CPU to 100%. Previously, this may have happened with 4 simultaneous jobs (and we built our schedules to eliminates this possibility). Now just two simultaneous jobs are easily saturating CPU to 100%. In fact, some jobs on one larger machine alone can push CPU to 70 % (happening now as I write this post). This is where all ugly stuff starts to happen (our experience): jobs fail to complete, leaving replicas and backups corrupted, unable to stop the job from Veeam interface, forced to reboot Veeam backup server to stop the job, etc, etc.

I would agree with Anton's comment that these issues may come from improved performance in NBD mode, we noticed that too. We did not discover this until significant scheduling change (now set to 2 concurrent jobs max). We took a page from Veeam v6 webinars: 2 CPUs - 1 job.

Now, lessons learned (hopefully):
1. Users - any upgrade (even minor) should not be taken for granted and should be approached with caution.
2. Veeam - it seems that there was not enough time for stress testing on low end configurations.

Thank you

Arthur
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Few updates on the situation:

1. Couple of support cases reporting performance issues after vSphere 5 patch were resolved and closed. In both cases, the reason was that some environmental changes which happened along with patching Veeam B&R messed up CBT data for some VMs. Veeam B&R had properly detected CBT data inconsistency for those VMs, and failed over to snap&scan method of determining incremental changes (which is noticeably slower, as it requires full VM image to be read). After resetting CBT on those VMs, performance of incremental backups came back to normal.

2. Virtual Appliance processing mode testing for pre- and post- patch code on low end backup server configuration did not show any performance differences in our lab. "Wall clock" full backup performance was exactly the same.

We will be testing other processing modes now.
chrmol
Enthusiast
Posts: 37
Liked: 2 times
Joined: May 17, 2010 7:41 pm
Full Name: Christian Moeller
Location: Denmark
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by chrmol »

Some stats on my daily backup (on my Lab server) - using NBD.

Before patch 6-8 minutes with 2vCPU Veeam server (with VSS).

After patch >1 hour with 2 vCPU Veeam server (no VSS - VSS failed properly because no available CPU resources).

After patch 8 minutes with 3vCPU Veeam server (with VSS).

Seems like more CPU is needed for the same performance – not a problem in my lab, but I’m curious to see how my customer’s production servers are affected.
jgrinwis
Novice
Posts: 9
Liked: never
Joined: Feb 02, 2011 12:53 pm
Full Name: John Grinwis
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jgrinwis »

Looks like CPU is going to a 100% when more then 1 job is running.
Rescheduled the jobs so there shouldn't be two jobs running at the same time, let's wait and see.
jcmachadouga
Enthusiast
Posts: 29
Liked: 7 times
Joined: Aug 02, 2011 2:17 pm
Full Name: Juan Machado
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jcmachadouga »

How do we reset CBT on those VMs ?

Thanks
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Usually, we recommend to reset CBT by completely disabling it first, and then having the Veeam job re-enable it automatically for you. This works more reliably overall, and reduces the chance of user errors.

If you are sure you are having the same issue, then you should do the below. Otherwise, it would be best to let our support investigate the logs first to confirm your issue is actually related to borked CBT data. They will be able to see that from the log files.

Open VMware vSphere Client, right-click the VM, choose Edit Settings, Options tab, select General, click Configuration Parameters, and set all entries with ctkEnabled substring to false. Run the job and it will then automatically re-enable changed block tracking with the correct settings. This first job run will be slow, but the following job runs will start leveraging CBT data.
jcmachadouga
Enthusiast
Posts: 29
Liked: 7 times
Joined: Aug 02, 2011 2:17 pm
Full Name: Juan Machado
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jcmachadouga »

Thanks.

I have been doing that but I am not sure what I am doing wrong.

-I turn off the VM
- set all entries with ctkEnabled substring to false, which are basically all the hard drives
- turn on the VM
- Run the Veeam job again, but looks what happens all the time (I tried this with 3 VMs):

Code: Select all

----
4 of 5 files processed 

Total VM size: 8.00 GB
Processed size: 8.00 GB
Processing rate: 17 MB/s
Backup mode: SAN with changed block tracking
Start time: 11/2/2011 9:25:47 AM
End time: 11/2/2011 9:34:00 AM
Duration: 0:08:13


Verifying changed block tracking...
Disk "Hard disk 2" has incorrect changed block tracking configuration.

Retrieving VM disks information...
Disk '[V18] sde1/sde1_1.vmdk' has been skipped because it was excluded from processing by user [i][b](this is a swapfile disk)[/b][/i]

Backing up object "[V18] sde1/sde1_1.vmdk"

One or more VM disks have incorrect changed block tracking configuration. To resolve this, open VMware vSphere Client, right-click the VM, choose Edit Settings, Options tab, select General, click Configuration Parameters, and set all entries with ‘ctkEnabled’ substring to false. Veeam Backup will then automatically re-enable changed block tracking with the correct settings during the next job run.
and Veeam tells me AGAIN that the disk has incorrect CBT and I have to do the ctkEnable -> False again...

Any ideas?

thanks
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

May be you are doing something wrong. Please contact our support for assistance, they will guide you through the process over webex. Thanks.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by averylarry »

The CBT error is wrongly displayed due to the excluded disk. I've been told this will not be fixed in Veeam 5 but should be fixed in Veeam 6.
jcmachadouga
Enthusiast
Posts: 29
Liked: 7 times
Joined: Aug 02, 2011 2:17 pm
Full Name: Juan Machado
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jcmachadouga »

Really? Thanks...
Gostev
Chief Product Officer
Posts: 31456
Liked: 6647 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by Gostev »

Final update on the issue

Eventually, we were able to reproduce the issue with VDDK 5.0 causing our data mover agent process to spike the CPU usage under certain environmental conditions. We were able to workaround this issue in our code, and now have an updated agent version available. If you believe you are affected with higher-than-normal CPU usage after applying VDDK 5.0 patch to version 5.0.2, please contact our support to obtain the newer agent build.

Thanks a lot to everyone who took the time to submit their logs and let us do a webex to see the issue in live. This helped tremendously, since the issue seems to depend on a number of factors, and thus was not easily reproducible in lab environments.
jcmachadouga
Enthusiast
Posts: 29
Liked: 7 times
Joined: Aug 02, 2011 2:17 pm
Full Name: Juan Machado
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by jcmachadouga »

Thanks... opening ticket right now
kocher
Influencer
Posts: 12
Liked: never
Joined: Jun 30, 2011 11:46 am
Full Name: Kristian Kocher
Contact:

Re: hotfix, performance degradation on vsphere 4

Post by kocher »

Hi all,

I have not yet applied the patch since I am still using vmware 4.1.
I have Veeam B&R installed on a physical server with two 4 core cpus.
When I do a full backup cpu still goes to 100%.

Has anyone else been experiencing this?
What could explain this difference in behaviour?
What can I expect with VDDK 5.0?

Thanks.
Kristian
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 159 guests