Comprehensive data protection for all workloads
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Incremental to rollback transform: possible bottlenecks?

Post by averylarry »

Transforming my backups from incremental to rollback is taking far longer with v6. I'm just trying to find a bottleneck, but I'm not seeing anything.

With v5, the backup/transform combination took around 6 hours. Since v6, it's been taking at least 24 hours. The incremental file sizes have been similar.

CPU on the backup server (which is also the proxy) runs between 20% and 30%. It's not doing anything else and has 4 vCPUs. The server has 6Gb of ram which I'd think is enough, though the task manager shows 0 free (no process has more than 75Mb). Maybe I'll hunt around and try to find that memory -- file cache size issue I vaguely remember reading about.

The job doesn't seem to track the bottleneck during a transform . . ?

The storage is on an iSCSI SAN. Admittedly low-end. However, the transform job is only reading/writing at around 5 MB/s. I can copy files to and from the backup drive during the transform at 50 MB/s. Even allowing for RAID, iSCSI, and randomness that is a very big drop in storage performance.

Could there be something else going on or somewhere else I should look to identify a bottleneck? I don't want to automatically think it could be a v6 issue, but it's hard to ignore the definite difference from 5 to 6 when I don't believe anything else would have changed in my environment.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

Please note that transform operation is performed by the backup repository agent, and not a backup proxy. However, I assume that your backup repository is also local.
averylarry wrote:The storage is on an iSCSI SAN. Admittedly low-end. However, the transform job is only reading/writing at around 5 MB/s. I can copy files to and from the backup drive during the transform at 50 MB/s. Even allowing for RAID, iSCSI, and randomness that is a very big drop in storage performance.
Actually, performance drop by an order of magnitude is quite expected with regular hard drives. See the below for sequential vs. random I/O performance comparison depending on hard drives type (regular vs. SSD), speaks for itself...
Image
Nevertheless, I will QC ask to compare v5 and v6 on the same data.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Thanks -- I know that drastic drops from random access is common. This just seems way more than can be ascribed to storage. Not that it can't be -- but I can max out at 200MB/s sequential (2 Gb agregated iSCSI ports).

So before I just say that's it's storage, I'd like to explore other possibilities.
DaveBartel
Novice
Posts: 3
Liked: never
Joined: Dec 22, 2011 8:10 pm
Full Name: Dave Bartel
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by DaveBartel »

Perhaps I'm misinterpreting things here, but as I understand it, your Veeam server is pulling data from this SAN, performing the transform and pushing the transformed data back to the same SAN?

If this is the case, what you're likely running into is performance issues when trying to read from and write to the disk at the same time. I have run into similar performance issues, especially in some of the cheaper SAN solutions using software RAID. Also, have you examined your CPU/memory utilization during the transformation process? Task manager should be able to show you quite quickly if you are being bottlenecked by one of those resources.

I'm guessing you've probably checked these items already. But if not, food for thought.
tsightler
VP, Product Management
Posts: 6040
Liked: 2867 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by tsightler »

Did the target change with the move from V5 -> V6? What is the target? Do you have any load balancing in use talking to this iSCSI target (I ask because an earlier forum posted indicated that disabling round-robin load-balancing made a huge difference in his case).

Also, how many spindles do you have? Synthetic full is all about random IOPS. It only takes 3-4 drives to get to 200MB/s sequential throughput, but those same drives will deliver at best maybe 250-400 random IOPS.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

1) CPU and memory -- as stated above, 20%-30% cpu. Memory is totally chewed up by OS cache, but that's how 2008 works. No processes are eating up ram. Good memory discussion here: Memory performance, but the hotfix said it doesn't apply to my server.

2) The target did not change. The server is a virtual machine and has the same virtual disk that it has always used.

3) I just tend to think storage is the wrong path to follow because I did not have this bad of performance when running the same thing under v5. I have 11 spindles running raid 6. I've tried fixed path and round robin with no difference. Additionally, I can easily read/write multiple separate large files at the same time as the transform (simulating pseudo-random simultaneous read/write iops) and get 50MB/s.

My suspicion is the disk cache/memory thing but I can't find a fix for my server (2008 R2 sp1).

Further clues leading towards the memory issue -- the job is still running, veeamagent is using 25% cpu, but the .vrb file hasn't been modified for about an hour and a half. The .vbk file hasn't been modified for almost 4 hours. No other files have been touched for 6 hours.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

Hi Ted, yesterday I have asked QC to compare v5 and v6 transform rates on the same data set and just heard back from them - v6 showed the same (even slightly faster) transform performance. Thanks.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

47 hours and still running. If it's the memory issue I described, could I pursue this with tech support? I don't have any type of Microsoft tech support . . .
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Also -- any other ideas out there? I've been around computers and storage enough to feel confident of 2 things --

1) I fully understand that storage is a huge probability and I won't convince anyone differently without some type of proof.
2) The read/writes are so ridiculously slow that I have to believe something else is at the very least exacerbating the problem.

I'll see if I can throw more ram at the server once the transform finishes. Also, I think I'll run a transform daily instead of weekly.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

You should open support case at least to have them check the logs and see if everything works OK on the Veeam side. They are unlikely to be able to help you if the issue is on Microsoft side. But I doubt there are issues on Microsoft side here (at least, we are not seeing them).

My only idea - was the job in question upgraded from v5, or created in v6 from scratch.
DaveBartel
Novice
Posts: 3
Liked: never
Joined: Dec 22, 2011 8:10 pm
Full Name: Dave Bartel
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by DaveBartel »

Sorry I missed the performance information in the original post, feeling like a bit of an idiot now.

Another long-shot, but do you possibly have enough local storage space on the ESX host running your Veeam VM that you could mount and create a backup repository there to potentially rule out storage as the issue?

I've seen some unusual performance issues lately as well with jobs freezing with no apparent changes for hours lately (Both with V5 and V6), but they've been tough to troubleshoot as we are having some hardware issues as well. If I find anything useful, I'll pass it on.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

DaveBartel wrote:Sorry I missed the performance information in the original post, feeling like a bit of an idiot now.

Another long-shot, but do you possibly have enough local storage space on the ESX host running your Veeam VM that you could mount and create a backup repository there to potentially rule out storage as the issue?

I've seen some unusual performance issues lately as well with jobs freezing with no apparent changes for hours lately (Both with V5 and V6), but they've been tough to troubleshoot as we are having some hardware issues as well. If I find anything useful, I'll pass it on.
Ha ha ha ha! I wish I had enough storage to mess around with.


Is there a built-in 48 hour limit on tranforms? I've had the transform fail twice now at exactly 48:01:08.

The .vbk file is ~990GB. One particular .vib file is ~445GB. There are 7 other .vib files from 1GB to 22GB. The .vrb file gets up to around 150GB before the transform job fails. The disk usage seems to degrade over time, but I haven't really kept track closely.

So right now -- the transform has been running for 4 hours 39 minutes. The .vbk file that's being processed is at 56GB. The disk usage is about 3.8MB/s. I'll try to keep track over time.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

The transform has now been running for 7 hours and 55 minutes. The .vrb (I meant .vrb in the last post) is up from 56GB to 72GB. The disk usage is down to about 2.8MB/s.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

This is very slow - sounds like a storage issue indeed.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

17 hours, 27 minutes. Up to 110GB .vrb file. The disk usage continues to get worse at 1.8MB/s.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

28 hours 32 minutes. 136GB .vrb file. No apparent activity at all. The .vrb file's last modified time is 40 minutes ago. The .vbk file hasn't changed in over 2 hours. Pretty much 0 usage.
DaveBartel
Novice
Posts: 3
Liked: never
Joined: Dec 22, 2011 8:10 pm
Full Name: Dave Bartel
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by DaveBartel »

Depending on your SAN, this may not be relevant, but I had similar performance issues with a low-end SAN we were using for backup storage with Veeam. For some inexplicable reason, the solution was for me to change the NIC-teaming mode on the SAN from round-robin to failover. In theory this should reduce overall throughput, but it returned my backup performance to what it was months ago. Best of luck to you in your continued struggle with this.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Once you fill up all the various locations where data can cache, whether the OS, the network, the host, the SAN cache, or the physical drives; you'll be at your "worst" storage performance (in general -- I'm simplifying). After over 100GB of data, I believe that all the separate caches are full, and have been for awhile. At this point, there's little left to explain the symptoms I've described from an exclusively storage viewpoint. This is my opinion. I don't have a good idea of how to prove it.

I'm getting very close to deleting this entire backup (for at least the 3rd time since upgrading to v6). I know tech support is busy and it'll be difficult to get past "this is a storage problem". I can't wait 2 days for it to fail and then try again. And again. And again.
My leading ideas:

1) This is the issue with Microsoft dynamic cache as I linked earlier -- http://support.microsoft.com/default.as ... -US;979149 Microsoft does not appear to have a workaround for 2008 R2 (the hotfix isn't applicable because it's included in SP1 but plenty of people still have the issue after SP1) and simply claims that the vendor should "configure the application to flush data more frequently".
2) Some bizarre corruption in the .vib or .vbk file is causing Veeam to slow down and eventually get caught in a "wait-forever" loop or something until a higher-level 48 hour timeout finally kills the job.
3) There is some scaling issue with my server and Veeam v6. Everything is fine with <100GB .vib files. The 445GB .vib file seems to die a slow death.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

It would have helped tremendously if you could to dedicate some other storage just for the sake of experiment (even a share on the desktop with a large 2TB hard drive). This would rule out (or prove) a possible issue with that specific storage device you are currently having issues with.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Indeed -- a decent idea. Don't think I have a 2TB drive, and if I did I can't imagine how long it would take to copy 2TB from the SAN to a single local SATA.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

Even "green" 2TB guys can do 100MB/s sequential writes. Sure, they will be very bad with random I/O (transform), but still they surely should be able to do better than a few MB/s (to complete halt), as you are seeing...
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

I got a different veeamtransport.exe file from support. 116 hours and counting. The .vrb file is up to 219GB.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

I am not quite sure what, and why they gave you. Reading back through my response above, there are no known issues with transform performance in v6, so there are no any kind of patches or fixes around this functionality available at this time.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Case 5163515
I presumed that there is a hard coded 48 hour limit on a transform and the new file has no limit. It's actually veeamtransportsvc.exe. There hasn't been any reference to possible performance issues. I think this is just so it can (eeevveeentuallllllyyyy) finish.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

No joy. There's a 7 day global timeout that I hit and failed. Allowing for triple data I/O (read from .vib, write to .vbk, write to .vrb), that still means I was running below 2Mb/s.

I'm going to setup iometer and see what it shows.
averylarry
Veteran
Posts: 264
Liked: 30 times
Joined: Mar 22, 2011 7:43 pm
Full Name: Ted
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by averylarry »

Grr. iometer doesn't seem to like my backup disk (dynamic ntfs spanned volume).

I'm considering wiping my entire 4Tb of backup and then I can recreate the datastore for native vmfs5.
ibis69
Lurker
Posts: 2
Liked: never
Joined: Oct 27, 2011 3:03 pm
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by ibis69 »

Have upgrade from v5 to v6patch2 last week, no changes on hardware, my five transform jobs (on datadomain appliance) are now taking at least 2x more time to achieve ...... Backup server is a windows 2003 and cpu/mem usage on it are very low ( 25 %). I really think there is something on v6 slowing down transform jobs.
Vitaliy S.
VP, Product Management
Posts: 27700
Liked: 2909 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Vitaliy S. »

How many transform operations did you already have? Was this the first time you tried to do that after upgrade?

You may want to open a support case at least to have our engineers check the logs and see if everything works fine on our side, may be there is something else that affects transform performance for the backup job.
habibalby
Veteran
Posts: 392
Liked: 34 times
Joined: Jul 18, 2011 9:30 am
Full Name: Hussain Al Sayed
Location: Bahrain
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by habibalby »

Hi,
having the same issue, but with no job failure it's just takes longer to finish Rollback Synthetic backup. Backup speed around 700/MBs and 1001/MBs VMDK file around 200GB and less. VBK file around 55GB.

Is it normal or some hot fix available for it?


Thanks,
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: transform incremental to rollback possible bottlenecks

Post by Gostev »

Transform is a very I/O intensive operation, so it is normal for it to take long time if backup target cannot dish out good IOPS.
Post Reply

Who is online

Users browsing this forum: Baidu [Spider] and 21 guests