- Service Provider
- Posts: 212
- Liked: 16 times
- Joined: Aug 02, 2011 9:30 pm
- Full Name: Matjaž Antloga
- Location: Celje, Slovenia
I'm helping customer to achieve requested 15 min RPO / 1 hour RTO for four critical servers. One of the roles is SQL. (also RDP, IIS, some apps and shared files)
Customer is hosting this app for end clients so those servers must be up and running 24/7.
We are using VMware Vsphere 5.5 environment with Veeam v8.
We've managed to get into that 15 min RPO with running parallel jobs however we stumble into network connection lost while doing VMware snapshots. That is unfortunately expected behaviour.
Those timeouts are affecting RDP connections, SQL itself etc...and end clients are complaining. We had to turn replication off during working hours soon after we start using it.
Customer has also another location where he is using plain Hyper-V replica to replicate some servers (also SQL) to another DR location. No timeouts experience there and it's working just fine. They will probably use that if we won't solve those timeouts with Veeam. (I'm well aware that it's not direct Veeam fault).
Now question: Can we expect same timeouts if we change infra from VMware to Hyper-V? More questions may arise when some feedback is given.
Thanks in advance, Matjaz
- Posts: 438
- Liked: 91 times
- Joined: Dec 08, 2014 2:58 pm
- Full Name: Steve Krause
The solution we use to get low RPO and ridiculously low RTO is running SQL Always-On Availability Groups and avoid using Veeam to replicate SQL at all, we just replicate our application servers that use the SQL server. This requires SQL Server Enterprise (2012 or higher) so licensing is not cheap.
If you already have Enterprise SQL you can set up the availability group to have an asynchronous replica at your target location (which will not do automatic failover) for no additional cost in licensing. Then in a fail-over situation you would manually fail to the secondary site.
If the snapshot stuns for your Veeam backup jobs on your SQL boxes after hours is not a concern for the customer, that method should easily get you the RPO/RTO you need. Once your primary comes back online you just have it sync back and then have it take back over.
In our situation, we have a portal site that can't handle nightly snapshot stuns so we actually have paid for a second SQL license (yay for educational pricing) to have 2 servers in the primary location which are configured with a synchronous replication with automatic failover (then a asynchronous replica in the failover site). We have them backed up by separate Veeam jobs which are chained to each other so they cannot run simultaneously. We backup the secondary server first, then the primary server runs after and when it snapshot stun happens it fails to the secondary and fails back when it is back online automatically. This is not an inexpensive solution, as you need double the amount of disk space in the primary location (and it needs to be the same speed since delays in writes on the secondary will cause delays in write commits on the primary) and of course RAM/CPU, but if you are in a business that needs really short RTPO with a database driven application it is just a cost of doing business.
In this setup, in a failure scenario I can fail to the async SQL server and spin up my replica application servers (4 hour schedule) through a failover plan that has us back up in less than 10 minutes with an RTO of seconds on the database data.
Veeam Certified Architect
Users browsing this forum: No registered users and 19 guests