Comprehensive data protection for all workloads
Post Reply
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

Case #00612376.

So right now I have two vSphere clusters, one is running Xeon processors, the other Opterons. I have a set of machines (Active Directory, SQL, Mail, etc.) in an application group. I want to deploy those across to the other cluster and I run into a handful of problems:
  • After the domain controller, everything else can boot in parallel, but this is not an option for anything but SureBackups that consist of an entire backup, that's a bit rough, labs can take up to an hour to boot.
  • Any minor disruption to the job will kill the entire lab, even if I'm on the last VM to boot, there is no automated second try or anything. If the job is running and Veeam has a hiccup caused by whatever, it'll eat the entire lab (an incremental backup yesterday for some reason shut down my lab).
  • When VMs boot, they seem to reboot after being online for a few minutes (probably detecting new CPU), at which point Veeam runs it's scripts, and they fail -- I need to be able to modify the stable algorithm settings somehow...
I have 9 virtual machines configured in an application group, they get booted up on one lab, should I be breaking up application groups and running multiple labs linked on the same networks? Am I missing something here to make things more consistent?

Is there any kind of a "best practices" for running testing/training/development labs using SureBackup? I've had some major issues the past week trying to keep my lab online so developers can use it.
Vitaliy S.
VP, Product Management
Posts: 27110
Liked: 2719 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by Vitaliy S. »

Hi William,
StrangeWill wrote:After the domain controller, everything else can boot in parallel, but this is not an option for anything but SureBackups that consist of an entire backup, that's a bit rough, labs can take up to an hour to boot.
What kind of storage do you use for your backups? Keep in mind that the performance of SureBackup jobs depends on the performance of your storage disks. The more IOPs they can provide, the quicker your jobs will be. How many VMs do you boot at a time?
StrangeWill wrote:an incremental backup yesterday for some reason shut down my lab
There is an existing topic about this, check it out > SureBackup jobs shut down.
StrangeWill wrote:When VMs boot, they seem to reboot after being online for a few minutes (probably detecting new CPU), at which point Veeam runs it's scripts, and they fail -- I need to be able to modify the stable algorithm settings somehow...
Does the same thing happen if you try to boot the VM on production cluster?

Thanks!
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

Vitaliy S. wrote:Hi William,
What kind of storage do you use for your backups? Keep in mind that the performance of SureBackup jobs depends on the performance of your storage disks. The more IOPs they can provide, the quicker your jobs will be. How many VMs do you boot at a time?
It's a virtual machine, the VMDK resides on a ZFS backed LUN (no SSDs right now though for ZIL or ARC2), which consist of a a 32-drive RAID-Z2. The SureBackup job only allows me to boot one at a time, though in the past I was using the same storage for larger VMs and could boot those in parallel on VMware pretty quickly (though I do acknowledge vPowerNFS adds some additional overhead).

I may see to acquire SSDs for this purpose, additional disks will be added soon adding two more stripes into the RAID-Z2 (for capacity purposes) which will improve performance too.

I'll investigate my logs if it happens again, thread doesn't seem too helpful on the cause of that.
Vitaliy S. wrote: Does the same thing happen if you try to boot the VM on production cluster?
I haven't tested it, but I'm 99% sure it's the OS detecting the new CPUs and wanting to restart, I can possibly deploy a lab on the production cluster just to test this theory, but even if it's true it doesn't help, I can't use the production cluster for this purpose.
veremin
Product Manager
Posts: 20282
Liked: 2257 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by veremin »

StrangeWill wrote:I'll investigate my logs if it happens again, thread doesn't seem too helpful on the cause of that.
Incremental run of reversed incremental job locks the backup files, thus, the SureBackup jobs gets stopped. Thanks.
Vitaliy S.
VP, Product Management
Posts: 27110
Liked: 2719 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by Vitaliy S. »

StrangeWill wrote:though in the past I was using the same storage for larger VMs and could boot those in parallel on VMware pretty quickly (though I do acknowledge vPowerNFS adds some additional overhead).
Seems like upgrading target storage hardware should give you better performance rates for the SureBackup job. On a side note, I wouldn't recommend storing your backup files inside the VMDK disk. Not only it provides additional complexity in case of the DR situation, but it also has some IO penalty for the SureBackup job. Booting VM directly from the RAW storage should be faster, then doing it from the VMDK due to additional storage layer being used.
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

StrangeWill wrote: I haven't tested it, but I'm 99% sure it's the OS detecting the new CPUs and wanting to restart, I can possibly deploy a lab on the production cluster just to test this theory, but even if it's true it doesn't help, I can't use the production cluster for this purpose.
Actually, found out what started to do this:

It's the domain controller, and it doesn't seem to be doing a non-authorative restore, AD is completely broken after it boots. This thread isn't helpful being as the tools are no longer available on 2012. :(

The domain controller is booting, rebooting, and all AD services are offline like NTFRS is messed up (Going to "Active Directory Users And Computers" errors out, all servers are unable to resolve AD accounts). This was working fine over a week ago. Ughhhhh! Machine is checked for "Domain Controller" under the application group.
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

v.Eremin wrote:Incremental run of reversed incremental job locks the backup files, thus, the SureBackup jobs gets stopped. Thanks.
Running incrementals being as we need to support tape.
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

StrangeWill wrote: Actually, found out what started to do this:

It's the domain controller, and it doesn't seem to be doing a non-authorative restore, AD is completely broken after it boots. This thread isn't helpful being as the tools are no longer available on 2012. :(

The domain controller is booting, rebooting, and all AD services are offline like NTFRS is messed up (Going to "Active Directory Users And Computers" errors out, all servers are unable to resolve AD accounts). This was working fine over a week ago. Ughhhhh! Machine is checked for "Domain Controller" under the application group.
Booted both DCs in my application group to remove all doubt, both DCs show errors on BPA results and a long list of errors from dcdiag. :( Opening a new ticket with support on this one, it's killing my entire lab even when I get it online!
Vitaliy S.
VP, Product Management
Posts: 27110
Liked: 2719 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by Vitaliy S. »

William, please post your case ID for future readers that might face the same issue. Thanks!
StrangeWill
Influencer
Posts: 24
Liked: 1 time
Joined: Aug 15, 2013 4:12 pm
Full Name: William Roush
Contact:

Re: SureBackup For Lab Usage Sort Of Unstable

Post by StrangeWill »

Update:

Case ID For the Domain Stuff: 00613327


Though I think things are much better, after major overhauls on our network including mandatory patches from Dell on our iSCSI switch, things are running much better and I don't have to touch the DCs.

However configuring VMs for the lab is pretty slow, but I'm going to make some final changes and come back to that later.
Post Reply

Who is online

Users browsing this forum: No registered users and 66 guests