-
- Novice
- Posts: 6
- Liked: never
- Joined: Nov 13, 2015 5:36 pm
- Contact:
Replication and Performance - I'm not getting it
As an introduction, I'm new to both Veeam and ESX. I was a Hyper-V guy from the beginning, but still learning my way around vSphere and Veeam.
I've been working with replication of a large (40TB) vm and am having problems determining either a) where my performance problems are coming from or b) why the performance is as expected.
Layout
Primary
3 x vSphere 5.5.0 servers (Essentials Plus) - 1 GB connected to LAN
1 x Coraid SRX6300 - 10GB connected to servers (We'll call this CoraidA)
DR - 1GB connected via private fiber at around 25ms
Same as above
(We'll call this storage CoraidB)
Veeam server running here
Veeam Proxy running here
Veeam replication job pulling the data from Primary to DR
Starting setup
We have 2 physical sites Primary and DR. DR has had the equipment, but we are just starting to make it functional. We had an issue with CoraidA at Primary, so we brought CoraidB over with the intention of migrating to it and moving CoraidA back to DR. The reasoning I'll not go into as it is a bit of a rabbit trail.
With this vm being so large, I decided I'd simply use replication and cutover. It turns out this is about a 120 hour job run for an initial replication. The thing that surprised me, though, was that we were only achieving an average of around 280MB/s and it showed a bottleneck in the target (CoraidB), which had nothing else running on it. The replication was using all default settings (Optimal compression and local target storage optimization) so Veeam recommended we go to No compression level and Local Target (16TB+ backup files) since a) the data in this vm is highly non-dedupe friendly and b) the vm is so large. Still, having the target be the bottleneck seemed bizarre. The network or source would have made far more sense.
Now, we've got CoraidA at DR and I've setup replication again. Only now, it shows the Source as the bottleneck (CoraidB) and it is moving at a whopping 5MB/s. Obviously, I have an open support ticket on the CoraidB storage since the issues look to be following it. However, it does make me wonder:
1) Do I have the replication setup as I should?
2) How do I go about pinpointing bottlenecks? What tools should I be using?
3) What do I not know here that I don't know I don't know?
Thanks
I've been working with replication of a large (40TB) vm and am having problems determining either a) where my performance problems are coming from or b) why the performance is as expected.
Layout
Primary
3 x vSphere 5.5.0 servers (Essentials Plus) - 1 GB connected to LAN
1 x Coraid SRX6300 - 10GB connected to servers (We'll call this CoraidA)
DR - 1GB connected via private fiber at around 25ms
Same as above
(We'll call this storage CoraidB)
Veeam server running here
Veeam Proxy running here
Veeam replication job pulling the data from Primary to DR
Starting setup
We have 2 physical sites Primary and DR. DR has had the equipment, but we are just starting to make it functional. We had an issue with CoraidA at Primary, so we brought CoraidB over with the intention of migrating to it and moving CoraidA back to DR. The reasoning I'll not go into as it is a bit of a rabbit trail.
With this vm being so large, I decided I'd simply use replication and cutover. It turns out this is about a 120 hour job run for an initial replication. The thing that surprised me, though, was that we were only achieving an average of around 280MB/s and it showed a bottleneck in the target (CoraidB), which had nothing else running on it. The replication was using all default settings (Optimal compression and local target storage optimization) so Veeam recommended we go to No compression level and Local Target (16TB+ backup files) since a) the data in this vm is highly non-dedupe friendly and b) the vm is so large. Still, having the target be the bottleneck seemed bizarre. The network or source would have made far more sense.
Now, we've got CoraidA at DR and I've setup replication again. Only now, it shows the Source as the bottleneck (CoraidB) and it is moving at a whopping 5MB/s. Obviously, I have an open support ticket on the CoraidB storage since the issues look to be following it. However, it does make me wonder:
1) Do I have the replication setup as I should?
2) How do I go about pinpointing bottlenecks? What tools should I be using?
3) What do I not know here that I don't know I don't know?
Thanks
-
- Veteran
- Posts: 361
- Liked: 109 times
- Joined: Dec 28, 2012 5:20 pm
- Full Name: Guido Meijers
- Contact:
Re: Replication and Performance - I'm not getting it
Needless to say an 40TB VM is crazy... Trying to think of a valid Reason why a single VM would need to be so large....
You say you were getting around 280Mb/s (Mbit?) which actually isn't very bad considering your single 40TB probably doesn't consist of that many extents so parallel processing isn't going to help you...
You say you were getting around 280Mb/s (Mbit?) which actually isn't very bad considering your single 40TB probably doesn't consist of that many extents so parallel processing isn't going to help you...
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Replication and Performance - I'm not getting it
Do you have proxies on both ends? What transport modes are being utilized by each of them? What repository is specified in the job settings as metadata storage?
-
- Novice
- Posts: 6
- Liked: never
- Joined: Nov 13, 2015 5:36 pm
- Contact:
Re: Replication and Performance - I'm not getting it
Thanks for the replies.
First, a 40TB vm != "crazy", it = "medical imaging". There isn't a ton of processing overhead, just a ton of of need for storage of that is very non-dedupe or compression friendly. Quality = Size = Important.
I followed the instructions http://helpcenter.veeam.com/backup/70/b ... ation.html, setup a Veeam Server and Proxy at both locations and created the replication job at the DR site so it pulls the data. Should I actually have 4 proxies? 1 at each location for each Veeam server?
Transport mode = 10GB servers to storage, 1GB fiber between sites
The metadata repo is the local repo at the DR site. I have one in source for Source and 1 in DR for the DR server.
First, a 40TB vm != "crazy", it = "medical imaging". There isn't a ton of processing overhead, just a ton of of need for storage of that is very non-dedupe or compression friendly. Quality = Size = Important.
I followed the instructions http://helpcenter.veeam.com/backup/70/b ... ation.html, setup a Veeam Server and Proxy at both locations and created the replication job at the DR site so it pulls the data. Should I actually have 4 proxies? 1 at each location for each Veeam server?
Transport mode = 10GB servers to storage, 1GB fiber between sites
The metadata repo is the local repo at the DR site. I have one in source for Source and 1 in DR for the DR server.
-
- Expert
- Posts: 245
- Liked: 58 times
- Joined: Apr 28, 2009 8:33 am
- Location: Strasbourg, FRANCE
- Contact:
Re: Replication and Performance - I'm not getting it
120h seems correct for initial Replication of 40TB on a 1Gb/s line
1Gb/s is about 100-120MB/s so it's about 400GB per hour x 100/120 hours => 40TB
1Gb/s is about 100-120MB/s so it's about 400GB per hour x 100/120 hours => 40TB
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Replication and Performance - I'm not getting it
By transport mode I mean how data is retrieved from the source storage, you can see it in the job session log if you select the VM to the left and look for the proxy server name selected for processing ([hotadd] or [nbd]).
Repository for storing replica metadata should be located closer to the source storage, as far as I can get, this is not so in your case (please check replication job settings).
Repository for storing replica metadata should be located closer to the source storage, as far as I can get, this is not so in your case (please check replication job settings).
-
- Veteran
- Posts: 361
- Liked: 109 times
- Joined: Dec 28, 2012 5:20 pm
- Full Name: Guido Meijers
- Contact:
Re: Replication and Performance - I'm not getting it
Even if it's medical Imaging, having 40TB on a single VM is still a major risk. What do you do if you need to restore? It will take days... Also file curruption could be an issue. Is this a windows vm? Ever thought about DFS? You can deploy the Content on multiple Servers and offer them as a single namespace (share)...
Ps. we also have some Servers with high Quality Images in the 1tb-5tb Range, used for cataloque production. windows dedupe works quite well on that! (ca 50% savings)
Ps. we also have some Servers with high Quality Images in the 1tb-5tb Range, used for cataloque production. windows dedupe works quite well on that! (ca 50% savings)
-
- Novice
- Posts: 6
- Liked: never
- Joined: Nov 13, 2015 5:36 pm
- Contact:
Re: Replication and Performance - I'm not getting it
Transport mode = [hotadd]
Repo = the repo for the DR Veeam Server is at the DR site, not the Primary site. Same for Primary. Do we need to have a repo and proxy for each Veeam Server at each site?
Repo = the repo for the DR Veeam Server is at the DR site, not the Primary site. Same for Primary. Do we need to have a repo and proxy for each Veeam Server at each site?
-
- Novice
- Posts: 6
- Liked: never
- Joined: Nov 13, 2015 5:36 pm
- Contact:
Re: Replication and Performance - I'm not getting it
Trust me, I do understand the concerns about large vm's, restores, etc. It is precisely why I am making sure we have replication working at our DR facility - no restores, we simply go hot at DR, which we own, operate and is warm standby. Our replication windows are such that we aren't too worried about replicated corruption to DR and this all falls within our RPO's and RTO's.
Otherwise, if you're familiar with PACS, you will understand why single machines are the current standard, though luckily options are coming available. We'll be at 120TB live data within a year so DFS replicating get's pretty expensive when you need multiple copies. Data change rates are low and data infusion rates are pretty manageable, though will skyrocket for a bit as we pull in a bunch of new and HUGE imagery for a current project underway. Oh, and PACS systems have very little in the way of duplicate data - they do not dedupe anything like I would have expected.
Otherwise, if you're familiar with PACS, you will understand why single machines are the current standard, though luckily options are coming available. We'll be at 120TB live data within a year so DFS replicating get's pretty expensive when you need multiple copies. Data change rates are low and data infusion rates are pretty manageable, though will skyrocket for a bit as we pull in a bunch of new and HUGE imagery for a current project underway. Oh, and PACS systems have very little in the way of duplicate data - they do not dedupe anything like I would have expected.
-
- Veeam Software
- Posts: 21138
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Replication and Performance - I'm not getting it
I'm talking about the repo that you select in the replication job itself, you should select the one that is close to data source. If it is selected correctly and hotadd is utilized both on source and target, then you seem to have everything configured correctly, so worth contacting support for a closer look.
-
- Novice
- Posts: 6
- Liked: never
- Joined: Nov 13, 2015 5:36 pm
- Contact:
Re: Replication and Performance - I'm not getting it
I hadn't realized I could point to same repo from both sites. I've got that setup now and have my both proxies added as I should have i.e. I've got a proxy in the Primary site, a Proxy in the DR site and the repo being explicitly used is on the Primary side. Veeam tech walked me through all this and had me disable VSS in this case. Killed the last replication that was running and once the snapshot is rolled back, I'll kick it all off again.
Thanks the for help!
Thanks the for help!
Who is online
Users browsing this forum: No registered users and 70 guests