-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
How i got Veeam to work ... well
After using Veeam almost 1.5 years now i think i have finally got it where i feel it's working very well for us. It's fast and stable. Since usually when someone fixes all their problems they vanish from the forums and no one now how they fixed their problem, i do the same ... This time i thought that since i have gotten so much help from this forum and Veeam support i would write a little about how i fixed my problems and maybe someone else have use for it ...
I have only used Veeam for local replication and these experiences might not be relevant for you
This is what i have learned:
Have a plan for how much you are going to replicate and how often and size things thereafter. This is the same problem that occurs with some replication nay sayers that dig up an old server from the basement install Vmware on it and then conclude that virtualization is rubbish ... If you are going to replicate alot of servers all the time you need the hardware for it.
if you fix one bottleneck another one will always show up, stop at one point within reason or you will go mad
IMHO ... Never use hotadd, it's broken beyond repair for the time being. The problems are endless, but some are disks not disconnecting from proxy's, extremely slow replication speed, and extremely high IOPS on target SAN crippling the whole environment.
Network mode works well but in my opinion should be used for remote replication / backup or if you can't use direct SAN.
Direct SAN is really the best solution if you can use it, use NBD for writing because it's much faster and more stable than hotadd and as tsightler pointed out in a very long thread, 1 Gb/s link could me more than enough even if you replicate continuously because not much data is changed in the minutes between replication. If you have the need for more speed you should also have the budget to upgrade to 10 Gb/s. In other words, don't bother with bonding several NIC adapters together because you probably don't need it and you probably won't get it to work properly ... But if you want to read about it here you go NIC Teaming & Veeam v6
Don't use too much time trying to optimize storage if you don't have big problems, because there is almost no gain especially if your target SAN is Veeam exclusive.
Run Veeam Console and Enterprise manager on a separate server from the Proxy / backup server as it uses quite a lot of CPU and RAM.
If you upgrade to a new version and everything is very slow or if one vm starts replicating very slow delete the whole replica and start over.
The biggest problem i have been having is that the UI on the Veeam server was very slow in reacting to input and the server used a lot of time on what i call "paper work" meaning that it used a lot of time preparing, fetching data etc. and i don't mean creating snapshots. It turned out that these two problems where connected and related to problems with the database. One of the problems where that some indexes where missing in the database. Turns out that this is a common problem, but it doesn't really become a problem before you have a lot of information in the database like job and task sessions logs. Alexander at Veeam support helped me with this, and also in cleaning out alot of the old data.
In the end i cleaned out everything in the database older than two weeks because who needs to see what happend on a replication job several months back .... I also set the Veeam server to only retain job logs for 3 days.
Then i proceeded to clean out the Vmware database as well and in the end i reduced the size of the vmware database from 50 GB to only 5 GB and the Veeam database from 15 GB to about 1 GB for the Veeam database and Enterprise Manager together.
Now everything is screaming fast and .... almost react before i press the buttons
Remember to rebuild all tables and indexes after you shrink the database or it will be fragmented. Google it and you will find some SQL scripts for doing that.
On the subject of databases, remember to restrict how much ram SQL can use so that it doesn't use everything and leave nothing for Veeam.
I think that's all i have for now, it took a long time to get there but i think we are finally there. Thanks to Tsightler, Foggy, Vitaliy and Gostev for all their help. I hope i won't need it again
I have only used Veeam for local replication and these experiences might not be relevant for you
This is what i have learned:
Have a plan for how much you are going to replicate and how often and size things thereafter. This is the same problem that occurs with some replication nay sayers that dig up an old server from the basement install Vmware on it and then conclude that virtualization is rubbish ... If you are going to replicate alot of servers all the time you need the hardware for it.
if you fix one bottleneck another one will always show up, stop at one point within reason or you will go mad
IMHO ... Never use hotadd, it's broken beyond repair for the time being. The problems are endless, but some are disks not disconnecting from proxy's, extremely slow replication speed, and extremely high IOPS on target SAN crippling the whole environment.
Network mode works well but in my opinion should be used for remote replication / backup or if you can't use direct SAN.
Direct SAN is really the best solution if you can use it, use NBD for writing because it's much faster and more stable than hotadd and as tsightler pointed out in a very long thread, 1 Gb/s link could me more than enough even if you replicate continuously because not much data is changed in the minutes between replication. If you have the need for more speed you should also have the budget to upgrade to 10 Gb/s. In other words, don't bother with bonding several NIC adapters together because you probably don't need it and you probably won't get it to work properly ... But if you want to read about it here you go NIC Teaming & Veeam v6
Don't use too much time trying to optimize storage if you don't have big problems, because there is almost no gain especially if your target SAN is Veeam exclusive.
Run Veeam Console and Enterprise manager on a separate server from the Proxy / backup server as it uses quite a lot of CPU and RAM.
If you upgrade to a new version and everything is very slow or if one vm starts replicating very slow delete the whole replica and start over.
The biggest problem i have been having is that the UI on the Veeam server was very slow in reacting to input and the server used a lot of time on what i call "paper work" meaning that it used a lot of time preparing, fetching data etc. and i don't mean creating snapshots. It turned out that these two problems where connected and related to problems with the database. One of the problems where that some indexes where missing in the database. Turns out that this is a common problem, but it doesn't really become a problem before you have a lot of information in the database like job and task sessions logs. Alexander at Veeam support helped me with this, and also in cleaning out alot of the old data.
In the end i cleaned out everything in the database older than two weeks because who needs to see what happend on a replication job several months back .... I also set the Veeam server to only retain job logs for 3 days.
Then i proceeded to clean out the Vmware database as well and in the end i reduced the size of the vmware database from 50 GB to only 5 GB and the Veeam database from 15 GB to about 1 GB for the Veeam database and Enterprise Manager together.
Now everything is screaming fast and .... almost react before i press the buttons
Remember to rebuild all tables and indexes after you shrink the database or it will be fragmented. Google it and you will find some SQL scripts for doing that.
On the subject of databases, remember to restrict how much ram SQL can use so that it doesn't use everything and leave nothing for Veeam.
I think that's all i have for now, it took a long time to get there but i think we are finally there. Thanks to Tsightler, Foggy, Vitaliy and Gostev for all their help. I hope i won't need it again
-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
Re: How i got Veeam to work ... well
Oh, and don't make any mistakes when you post to the forums because you can't edit your post .... Please fix this
Well i can edit the reply .... Hmm
Well i can edit the reply .... Hmm
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: How i got Veeam to work ... well
Nice post Lars!
As an addition, I can confirm by first hand experience that 10G NBD mode is really FAST! Management interface in ESXi is throttled when is running on 1G networks, bu on 10G I can assure it's a great alternative, especially for small VMs, since it allows to skip all the hotadd/hotremove activities.
Luca.
As an addition, I can confirm by first hand experience that 10G NBD mode is really FAST! Management interface in ESXi is throttled when is running on 1G networks, bu on 10G I can assure it's a great alternative, especially for small VMs, since it allows to skip all the hotadd/hotremove activities.
Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
Re: How i got Veeam to work ... well
If i had the means to edit my original post i would be able to thank you as well in addition to correcting all those irritating spelling and grammar errors
Is it really throttled ? do you have any documentation on this and is it possible to remove the throttling ?
Maybe if i dedicate one NIC to management on it's own vswitch ? There is an option for Traffic Shaping, but it's not activated ....
And i would also like to add something about 10Gb that maybe not everyone is aware of. You don't need a 10Gb switch to use it !!
On my production side i have 2 ESXi servers running 76 VM's and between them i have used two cheap cables connected to two 10Gb adapters on each server for Vmotion and it works great ... Now instead of having 4Gb between the servers i have 20Gb And Vmotion can use all of it ...
I'm gonna buy two 10Gb adapters for my DR site as well and put one in the DR site ESXi host and one in the Physical Veeam server to use for NBD at the cost of almost nothing ...
Is it really throttled ? do you have any documentation on this and is it possible to remove the throttling ?
Maybe if i dedicate one NIC to management on it's own vswitch ? There is an option for Traffic Shaping, but it's not activated ....
And i would also like to add something about 10Gb that maybe not everyone is aware of. You don't need a 10Gb switch to use it !!
On my production side i have 2 ESXi servers running 76 VM's and between them i have used two cheap cables connected to two 10Gb adapters on each server for Vmotion and it works great ... Now instead of having 4Gb between the servers i have 20Gb And Vmotion can use all of it ...
I'm gonna buy two 10Gb adapters for my DR site as well and put one in the DR site ESXi host and one in the Physical Veeam server to use for NBD at the cost of almost nothing ...
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: How i got Veeam to work ... well
I never found anything official, but at the end, the only reasons to have no more than 6-7 mbits could only be "is throttled" or "it sucks"
But, since when it's running on 10G it flies, my suspect was that is throttled. I'd like too to see somewhere an official statement. Forget traffic shaping, is made for other reasons.
Luca.
But, since when it's running on 10G it flies, my suspect was that is throttled. I'd like too to see somewhere an official statement. Forget traffic shaping, is made for other reasons.
Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
Re: How i got Veeam to work ... well
Ok, thanks. Will try to find something on this ....
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: How i got Veeam to work ... well
One trick you can use if you want to use NBD mode and only have 1GbE is to create a dedicated VMK interface and attach it to an isolated vSwitch, then deployed a dual-homed proxy with one "leg" attached to the same vSwitch. That way all network traffic stays within the virtual layer, which is even faster than 10Gb. Of course this does require deploying at least 1 proxy on each host, but it does work quite well. With a little subnet magic (using a small subnet per-host) Veeam even automatically picks the correct proxy for every host since, when using NBD mode, Veeam always prefers proxies in the same subnet with the host.
-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
Re: How i got Veeam to work ... well
Yes, but this is useful only if you are also reading the source data via network or NBD. I'm using a physical server with direct SAN access ....
But one thing i have always been wondering, could one use DirectPath I/O to present a dedicated fibre channel card to a virtual machine which then could use your trick or hotadd (if it's ever fixed) to write data and still use Direct SAN access to read data ?
But one thing i have always been wondering, could one use DirectPath I/O to present a dedicated fibre channel card to a virtual machine which then could use your trick or hotadd (if it's ever fixed) to write data and still use Direct SAN access to read data ?
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: How i got Veeam to work ... well
Oh I know, I was basing off the discussion above around "throttling" of 1Gb NBD mode, in other words, it was spinning way off topic. That being said, this trick can still be useful on the "target" since, as you're quite aware, hotadd can sometimes be performance issue on the target due to the increased I/O.lars@norstat.no wrote:Yes, but this is useful only if you are also reading the source data via network or NBD. I'm using a physical server with direct SAN access ....
-
- Expert
- Posts: 110
- Liked: 14 times
- Joined: Nov 01, 2011 1:44 pm
- Full Name: Lars Skjønberg
- Contact:
Re: How i got Veeam to work ... well
he he, well the topic was actually "how i got Veeam to work well" so i think it added very nicely to the topic, it was me that forgot that this topic was not about my specific case
But you have given me and idea because in a scenario where you have f.ex 2 x 8Gb fiberchannel link to a site, but not that much normal network bandwidth you could use this method and combine it with direct SAN without having a physical server there. Just one or two dedicated fiberchannel cards .... I think i will test this just for fun
But you have given me and idea because in a scenario where you have f.ex 2 x 8Gb fiberchannel link to a site, but not that much normal network bandwidth you could use this method and combine it with direct SAN without having a physical server there. Just one or two dedicated fiberchannel cards .... I think i will test this just for fun
-
- Influencer
- Posts: 12
- Liked: 6 times
- Joined: Oct 23, 2013 9:32 am
- Full Name: Aleksandr Alpatov
- Contact:
Re: How i got Veeam to work ... well
Hello! It's a good idea, because now I have issue with writing proxy in nbd mode - the performance is only 40 MB/s. I put proxy in vmkernel subnet and it don't use pNIC (1 Gbps) but replication speed is too low I think.tsightler wrote:One trick you can use if you want to use NBD mode and only have 1GbE is to create a dedicated VMK interface and attach it to an isolated vSwitch, then deployed a dual-homed proxy with one "leg" attached to the same vSwitch. That way all network traffic stays within the virtual layer, which is even faster than 10Gb. Of course this does require deploying at least 1 proxy on each host, but it does work quite well. With a little subnet magic (using a small subnet per-host) Veeam even automatically picks the correct proxy for every host since, when using NBD mode, Veeam always prefers proxies in the same subnet with the host.
Please tell me more about your trick - is the second VMK interface that will be in isolated switch must have another IP address then the first one right? And proxy must have the ip from the same subnet as the VMK? And should I use 2 vNICs on Veeam proxy? Tnank you!
-
- Influencer
- Posts: 12
- Liked: 6 times
- Joined: Oct 23, 2013 9:32 am
- Full Name: Aleksandr Alpatov
- Contact:
Re: How i got Veeam to work ... well
tsightler
I try to use your schema but it don't work. Here it is:
windows-veeam2 - B&R Server include Proxy nbd - esx1 - vNIC1 ip 10.230.144.101 (network 10.230.144.0/24) [Production dVS]
windows-veeam3 - Veeam Proxy nbd --------------- esx1 - vNIC1 ip 10.230.144.200 (network 10.230.144.0/24) [Production dVS] - vNIC2 ip 192.168.2.2 (network 192.168.2.0/24) [vSwitch1]
esx1 - vmk0 (managment ticked) - ip 10.230.245.60 (network 10.230.245.0/26) - 2 pNICs to [Production dVS], vmk1 (managment ticked) - ip 192.168.2.1 (network 192.168.2.0/24) - 0 pNICs to [vSwitch1]
Pings from windows-veeam2 to vmk1 ip 192.168.2.1 is ok
I run replication job with source and target proxy "windows-veeam3" - but traffic as I see in resource monitor goes through 10.230.144.200 <-> 10.230.245.60 and performance is slow (~40 MB/s)
Please give the advice - what's wrong?
I try to use your schema but it don't work. Here it is:
windows-veeam2 - B&R Server include Proxy nbd - esx1 - vNIC1 ip 10.230.144.101 (network 10.230.144.0/24) [Production dVS]
windows-veeam3 - Veeam Proxy nbd --------------- esx1 - vNIC1 ip 10.230.144.200 (network 10.230.144.0/24) [Production dVS] - vNIC2 ip 192.168.2.2 (network 192.168.2.0/24) [vSwitch1]
esx1 - vmk0 (managment ticked) - ip 10.230.245.60 (network 10.230.245.0/26) - 2 pNICs to [Production dVS], vmk1 (managment ticked) - ip 192.168.2.1 (network 192.168.2.0/24) - 0 pNICs to [vSwitch1]
Pings from windows-veeam2 to vmk1 ip 192.168.2.1 is ok
I run replication job with source and target proxy "windows-veeam3" - but traffic as I see in resource monitor goes through 10.230.144.200 <-> 10.230.245.60 and performance is slow (~40 MB/s)
Please give the advice - what's wrong?
-
- Influencer
- Posts: 23
- Liked: 3 times
- Joined: Jul 01, 2011 12:50 pm
- Full Name: Loren Gordon
- Contact:
Re: How i got Veeam to work ... well
Hey Tom,
That's an interesting idea. I'm hoping you can provide some more detail on the network design that gets us there... One catch might be the need to pay particular attention to how the Veeam servers resolve the ESXi hosts' IP addresses, and vice versa, to make sure the servers' routing tables direct the connections to the desired interfaces. From my testing, the connections appear to be established from the proxy server to the ESXi host. The proxy server resolves the hostname of the ESXi host to determine which IP to connect to, and the proxy uses it's own routing table to determine which interface initiates that connection. The ESXi host doesn't do any name lookup because it just needs to reply back to the IP the connection came from.
So, the optimization here is to get the Veeam servers and the ESXi hosts communicating on the same vSwitch. The first step to accomplishing this is getting them to communicate on the same subnet, and the trick there is getting the Veeam servers to connect to the desired ESXi IP in the first place. If using vCenter and if the hosts are connected to vCenter by hostname (which I believe is a common and generally recommended practice), then the Veeam servers will use that hostname to resolve the IP. I'm going to call that the ESXi "primary" management vmk interface. The connection is then established from the proxy server to the ESXi primary management interface. The problem here is that the additional vmk interface is not actually utilized*; let's call this the secondary vmk interface.
Instead, my rule of thumb recommendation for the moment has been to place a secondary interface on all Veeam servers in the same subnet as the ESXi hosts' primary vmk management interface (to which the ESXi hostname resolves). The proxy then uses it's own routing table to establish the connection out it's secondary interface to the ESXi host's primary interface, which is in the same subnet. So we're halfway there. However, with this particular design the ESXi host does not utilize a secondary vmk interface, so it's not possible to achieve the desired optimization of limiting network traffic to an isolated vSwitch. An isolated vSwitch cannot be used because the host must communicate with vCenter on the primary management vmk interface as well.
(*Note that you could get fancy with the proxy networking and create a persistent route to the ESXi primary management interface out the proxy's dedicated interface to avoid this problem, or you could use host records on the proxy server to achieve the same thing, but these static customizations get difficult to troubleshoot and scale so I've been trying to avoid them. And you can also introduce some asynchronous routing with these approaches, which may cause other odd problems depending on your overall networking design.)
If my understanding is incorrect about IP and name resolution and the establishment of connections, please feel free to correct me. But this has been my experience doing something similar to avoid asynchronous routing with Veeam Backup, ESXi, and 10GbE networks.
That's an interesting idea. I'm hoping you can provide some more detail on the network design that gets us there... One catch might be the need to pay particular attention to how the Veeam servers resolve the ESXi hosts' IP addresses, and vice versa, to make sure the servers' routing tables direct the connections to the desired interfaces. From my testing, the connections appear to be established from the proxy server to the ESXi host. The proxy server resolves the hostname of the ESXi host to determine which IP to connect to, and the proxy uses it's own routing table to determine which interface initiates that connection. The ESXi host doesn't do any name lookup because it just needs to reply back to the IP the connection came from.
So, the optimization here is to get the Veeam servers and the ESXi hosts communicating on the same vSwitch. The first step to accomplishing this is getting them to communicate on the same subnet, and the trick there is getting the Veeam servers to connect to the desired ESXi IP in the first place. If using vCenter and if the hosts are connected to vCenter by hostname (which I believe is a common and generally recommended practice), then the Veeam servers will use that hostname to resolve the IP. I'm going to call that the ESXi "primary" management vmk interface. The connection is then established from the proxy server to the ESXi primary management interface. The problem here is that the additional vmk interface is not actually utilized*; let's call this the secondary vmk interface.
Instead, my rule of thumb recommendation for the moment has been to place a secondary interface on all Veeam servers in the same subnet as the ESXi hosts' primary vmk management interface (to which the ESXi hostname resolves). The proxy then uses it's own routing table to establish the connection out it's secondary interface to the ESXi host's primary interface, which is in the same subnet. So we're halfway there. However, with this particular design the ESXi host does not utilize a secondary vmk interface, so it's not possible to achieve the desired optimization of limiting network traffic to an isolated vSwitch. An isolated vSwitch cannot be used because the host must communicate with vCenter on the primary management vmk interface as well.
(*Note that you could get fancy with the proxy networking and create a persistent route to the ESXi primary management interface out the proxy's dedicated interface to avoid this problem, or you could use host records on the proxy server to achieve the same thing, but these static customizations get difficult to troubleshoot and scale so I've been trying to avoid them. And you can also introduce some asynchronous routing with these approaches, which may cause other odd problems depending on your overall networking design.)
If my understanding is incorrect about IP and name resolution and the establishment of connections, please feel free to correct me. But this has been my experience doing something similar to avoid asynchronous routing with Veeam Backup, ESXi, and 10GbE networks.
tsightler wrote:One trick you can use if you want to use NBD mode and only have 1GbE is to create a dedicated VMK interface and attach it to an isolated vSwitch, then deployed a dual-homed proxy with one "leg" attached to the same vSwitch. That way all network traffic stays within the virtual layer, which is even faster than 10Gb. Of course this does require deploying at least 1 proxy on each host, but it does work quite well. With a little subnet magic (using a small subnet per-host) Veeam even automatically picks the correct proxy for every host since, when using NBD mode, Veeam always prefers proxies in the same subnet with the host.
-
- Influencer
- Posts: 23
- Liked: 3 times
- Joined: Jul 01, 2011 12:50 pm
- Full Name: Loren Gordon
- Contact:
Re: How i got Veeam to work ... well
Lars,
Would you be willing to provide more detail on how you improved the database implementation and configuration? i.e. How did you determine which indexes were missing and how did you create/re-create them? How did you identify old database records, and how did you clean them out?
I have very similar problems with the UI for large implementations and would love to improve its responsiveness. Thanks...
-Loren
Would you be willing to provide more detail on how you improved the database implementation and configuration? i.e. How did you determine which indexes were missing and how did you create/re-create them? How did you identify old database records, and how did you clean them out?
I have very similar problems with the UI for large implementations and would love to improve its responsiveness. Thanks...
-Loren
The biggest problem i have been having is that the UI on the Veeam server was very slow in reacting to input and the server used a lot of time on what i call "paper work" meaning that it used a lot of time preparing, fetching data etc. and i don't mean creating snapshots. It turned out that these two problems where connected and related to problems with the database. One of the problems where that some indexes where missing in the database. Turns out that this is a common problem, but it doesn't really become a problem before you have a lot of information in the database like job and task sessions logs. Alexander at Veeam support helped me with this, and also in cleaning out alot of the old data.
In the end i cleaned out everything in the database older than two weeks because who needs to see what happend on a replication job several months back .... I also set the Veeam server to only retain job logs for 3 days.
Who is online
Users browsing this forum: Bing [Bot] and 49 guests