-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
StoreOnce Catalyst performance not as expected
We're currently evaluating a StoreOnce appliance at a customer site.
The appliance is connected via 8 Gb/s FC to the backup server and we're using StoreOnce Catalyst for the backups.
During backups we see a throughout of 150-300MB/s, which is not high but OK. Additional Full backups of VMs which are already on the appliance are running at the same rate.
I would expect a much higher performance with Catalyst as there's no need to read/write the data again; with every run the dedupe rate is increasing but performance stays at the same level.
We've tested both Low and High Bandwith, but didn't notice any differences.
Bottleneck according to Veeam is Source 40%, Proxy 24%, Network 13%, 99% Target.
Am I missing something here?
The appliance is connected via 8 Gb/s FC to the backup server and we're using StoreOnce Catalyst for the backups.
During backups we see a throughout of 150-300MB/s, which is not high but OK. Additional Full backups of VMs which are already on the appliance are running at the same rate.
I would expect a much higher performance with Catalyst as there's no need to read/write the data again; with every run the dedupe rate is increasing but performance stays at the same level.
We've tested both Low and High Bandwith, but didn't notice any differences.
Bottleneck according to Veeam is Source 40%, Proxy 24%, Network 13%, 99% Target.
Am I missing something here?
-
- Technology Partner
- Posts: 25
- Liked: 2 times
- Joined: May 11, 2015 11:51 am
- Full Name: Patrick Huber
- Contact:
Re: StoreOnce Catalyst performance not as expected
Hello
maybe you need to configure more streams for the fibre-channel Adapters on the storeonce.
Look at this guide on page 85:
https://support.hpe.com/hpsc/doc/public ... 43535en_us
got that hint from a HPE pre-sales guy.
that removes the stream limit on the FC Cards. So that you can use more parallellisation.
And remeber to connect both the StoreOnce AND the backup Server to the FC SAN. If there is a LAN connection anywhere in between you will surely experience loss of bandwidth.
Regards,
Patrick
maybe you need to configure more streams for the fibre-channel Adapters on the storeonce.
Look at this guide on page 85:
https://support.hpe.com/hpsc/doc/public ... 43535en_us
got that hint from a HPE pre-sales guy.
that removes the stream limit on the FC Cards. So that you can use more parallellisation.
And remeber to connect both the StoreOnce AND the backup Server to the FC SAN. If there is a LAN connection anywhere in between you will surely experience loss of bandwidth.
Regards,
Patrick
VEEAM Enthusiast
Veeam certified Architect
Veeam certified Architect
-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: StoreOnce Catalyst performance not as expected
I've already increased the logins per port to 16 but I'll try 256 and see if anything happens.
What do you mean with connecting the StoreOnce to the FC SAN?
Isn't the data flow SAN -> Backup Server -> StoreOnce?
What do you mean with connecting the StoreOnce to the FC SAN?
Isn't the data flow SAN -> Backup Server -> StoreOnce?
-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: StoreOnce Catalyst performance not as expected
So with 256 and no limitation from the repository the performance stays the same; a single task/stream will not pass over 160MB/s.
I'm still not sure when Catalyst should/would kick in...
One thing I've noticed; If I uncheck "decompress backup data" on the repository, performance almost doubles per stream; perhaps the proxy/gateway server is to slow?
I'm still not sure when Catalyst should/would kick in...
One thing I've noticed; If I uncheck "decompress backup data" on the repository, performance almost doubles per stream; perhaps the proxy/gateway server is to slow?
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: StoreOnce Catalyst performance not as expected
My understanding is that this is perfectly normal and by design for ANY deduplicating storage. You can only scale throughput by increasing the number of streams, and this is the case for both writes and reads (backup and restore, that is).Regnor wrote:a single task/stream will not pass over 160MB/s
-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: StoreOnce Catalyst performance not as expected
I thought that if the data/blocks were already on the deduplication appliance then we wouldn't have to send them over again and therefore increase throughput (for a single stream). At the moment it looks like we're always sending over the data to the appliance, which only discards the blocks.
Regarding the performance boost of unchecking "decompress backup data":
I've set the power management settings to high performance (backup proxy) and now the rates are equal; a single stream now goes to 300MB/s.
Regarding the performance boost of unchecking "decompress backup data":
I've set the power management settings to high performance (backup proxy) and now the rates are equal; a single stream now goes to 300MB/s.
-
- Veeam Software
- Posts: 649
- Liked: 170 times
- Joined: Dec 10, 2012 8:44 am
- Full Name: Nikita Efes
- Contact:
Re: StoreOnce Catalyst performance not as expected
It depends on your Catalyst Store setting. If it is set to high bandwidth, all data is sent to device as is, and deduplication happens on device itself.Regnor wrote:I thought that if the data/blocks were already on the deduplication appliance then we wouldn't have to send them over again and therefore increase throughput (for a single stream). At the moment it looks like we're always sending over the data to the appliance, which only discards the blocks.
If it is set to low bandwidth, deduplication happens at Catalyst library (on gateway where Veeam target agent works) and only new blocks are sent to device.
You can check both and see which one will result in better overall job performance, however from my experience, high bandwidth is usually faster.
-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: StoreOnce Catalyst performance not as expected
There's almost no difference when comparing high bandwidth to low bandwidth mode; at least in that case.
-
- Chief Product Officer
- Posts: 31804
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: StoreOnce Catalyst performance not as expected
That is correct, so you will see the difference when the bandwidth is limited. Otherwise, there would not be any performance again, as the process of identifying where the data block already exists on a dedupe appliance slows processing down, balancing out the benefit from not sending some blocks over. Naturally, on high speed network connection it could be faster to just shoot everything to the dedupe appliance, and let it deal with the data locally.Regnor wrote:I thought that if the data/blocks were already on the deduplication appliance then we wouldn't have to send them over again and therefore increase throughput (for a single stream).
-
- Technology Partner
- Posts: 36
- Liked: 38 times
- Joined: Aug 21, 2017 3:27 pm
- Full Name: Federico Venier
- Contact:
Re: StoreOnce Catalyst performance not as expected
Backup performance for single stream is roughly 30% faster when you keep source side dedupe disabled (i.e. "High Bandwidth" on the Catalyst Store GUI). Despite this, I keep suggesting using source side dedupe (i.e. Low-bandwidth mode) in production environments.
Working in source side dedupe, when your backup job contains about 10 VMs and you process them concurrently, you can easily see the throughput rising higher than 1500MB/s even across a single 10GbE link. When you add more jobs and proxy/gateway servers in the game, you can rise the throughput even further without being limited by the network connectivity.
It is important you make sure the entire data path is design for your expected throughput, not just the last link to StoreOnce. At these speeds, the primary storage could easily became a bottleneck as well.
An important suggestion is to pay attention that your Veeam Proxy, which is selected by the Backup-Job, runs on the same server (physical or VM) as your Veeam Gateway, which is selected by the Backup Repository. If the two services run on different servers, then your backup data does not go straight to StoreOnce via Catalyst, but there is an extra hop in LAN and that connection is not deduped.
In the past, I thought Catalyst could make high bandwidth reduction only for Full Backups, and my lab tests seemed to validate that behavior. Then I sow production environments achieving bandwidth reduction in the range of 10:1 to 30:1 even for CBT incremental backups. That surprised me, so I went back to my lab to check what was wrong... and the wrong part was my workload generator based on many 50MB files. Indeed, most production workloads are based on many small write operation and few large ones. When I changed my workload generator to produce many small and few large files, I sow a rising dedupe effect also for Incremental backups. Tests with just 1% of new data distributed over a lot of very small files were able to generate .vib as big as 15% of the full. This happens because even a few KBs file forces the CBT engine to mark as changed an entire 1MB wide segment in its VMDK. StoreOnce dedupe works at a much granular level and it identifies the real changed data inside the larger segment, and so it avoids sending the unchanged parts.
This long discussion to say that Source Side Dedupe helps also for Incremental backup and not only for the Full ones.
Working in source side dedupe, when your backup job contains about 10 VMs and you process them concurrently, you can easily see the throughput rising higher than 1500MB/s even across a single 10GbE link. When you add more jobs and proxy/gateway servers in the game, you can rise the throughput even further without being limited by the network connectivity.
It is important you make sure the entire data path is design for your expected throughput, not just the last link to StoreOnce. At these speeds, the primary storage could easily became a bottleneck as well.
An important suggestion is to pay attention that your Veeam Proxy, which is selected by the Backup-Job, runs on the same server (physical or VM) as your Veeam Gateway, which is selected by the Backup Repository. If the two services run on different servers, then your backup data does not go straight to StoreOnce via Catalyst, but there is an extra hop in LAN and that connection is not deduped.
In the past, I thought Catalyst could make high bandwidth reduction only for Full Backups, and my lab tests seemed to validate that behavior. Then I sow production environments achieving bandwidth reduction in the range of 10:1 to 30:1 even for CBT incremental backups. That surprised me, so I went back to my lab to check what was wrong... and the wrong part was my workload generator based on many 50MB files. Indeed, most production workloads are based on many small write operation and few large ones. When I changed my workload generator to produce many small and few large files, I sow a rising dedupe effect also for Incremental backups. Tests with just 1% of new data distributed over a lot of very small files were able to generate .vib as big as 15% of the full. This happens because even a few KBs file forces the CBT engine to mark as changed an entire 1MB wide segment in its VMDK. StoreOnce dedupe works at a much granular level and it identifies the real changed data inside the larger segment, and so it avoids sending the unchanged parts.
This long discussion to say that Source Side Dedupe helps also for Incremental backup and not only for the Full ones.
-
- Influencer
- Posts: 10
- Liked: never
- Joined: Oct 11, 2016 8:23 am
- Contact:
Re: StoreOnce Catalyst performance not as expected
Just to let you know:
We run 3540s over 10GbE to the StoreOnce.
FC for storage.
We need to get on average around 170MB/s for writing to the 3540.
This peak though in another site - does go up to 365MB/s - processing rate 231 MB/s for example.
Depending on the model though - would've thought 150-300MB/s was ok?
We run 3540s over 10GbE to the StoreOnce.
FC for storage.
We need to get on average around 170MB/s for writing to the 3540.
This peak though in another site - does go up to 365MB/s - processing rate 231 MB/s for example.
Depending on the model though - would've thought 150-300MB/s was ok?
-
- Novice
- Posts: 4
- Liked: 1 time
- Joined: Jun 11, 2018 5:59 am
- Full Name: Andreas Buetler
- Contact:
Re: StoreOnce Catalyst performance not as expected
Hi,
We have here a StoreOnce 4900 and had also performance issue over FC. After we have switched to 2x 10GB Ethernet in LACP setting the performance was twice than before.
But the performance is not good enouth and we had some other trouble with this device.
Now we are evaluate some other backup appliance from ExaGrid. ExaGrid have another strategy with a landing zone. The idea is to backup to a non deduplicated storage and transfer the older backup data to the deduplicated storage.
We start with the proof of concept in a few weeks.
I think it's highly recommended to check some other products than the HPE StoreOnce.
Br sys-adm
We have here a StoreOnce 4900 and had also performance issue over FC. After we have switched to 2x 10GB Ethernet in LACP setting the performance was twice than before.
But the performance is not good enouth and we had some other trouble with this device.
Now we are evaluate some other backup appliance from ExaGrid. ExaGrid have another strategy with a landing zone. The idea is to backup to a non deduplicated storage and transfer the older backup data to the deduplicated storage.
We start with the proof of concept in a few weeks.
I think it's highly recommended to check some other products than the HPE StoreOnce.
Br sys-adm
-
- VeeaMVP
- Posts: 1006
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: StoreOnce Catalyst performance not as expected
@Federico: Thanks for your input; it's really interesting to see how the StoreOnce works with different workloads.
I'll play with both modes and see how the compare with more VMs.
@Johna8: The performance is really ok from my point of view; I just thought the it would increase more when dedupe kicks in.
@sys-adm: When using a dedup appliance as primary backup storage I see advantages in having a landing zone. We're using them as a secondary storage so performance isn't that critical.
To come to a conclusion; the more streams/tasks we put on the StoreOnce, the better performance results we get.
I'll play with both modes and see how the compare with more VMs.
@Johna8: The performance is really ok from my point of view; I just thought the it would increase more when dedupe kicks in.
@sys-adm: When using a dedup appliance as primary backup storage I see advantages in having a landing zone. We're using them as a secondary storage so performance isn't that critical.
To come to a conclusion; the more streams/tasks we put on the StoreOnce, the better performance results we get.
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 127 guests