-
- Veteran
- Posts: 354
- Liked: 73 times
- Joined: Jun 30, 2015 6:06 pm
- Contact:
Could we talk mutliple streams for a moment please?
Morning everyone. Question, in version 8, and possibly earlier versions (don't know when it was introduced but thought I saw it in 7), we have the option of "Use multiple upload streams per job," w/ an increment counter that I believe goes to 100. The description indicates "Improves job performance through better utilization of high-latency links. Disable this option if you are running a large amount of concurrent jobs, or for networking equipment compatibility purposes."
Being somewhat a networking guy, I got excited when I saw "use multiple upload streams per job," which I took to mean "multiple data streams," of which LAG can make great use. But the description throws me a bit; does it actually not mean multiple data streams? When checked, and running concurrent jobs, each job w/ multiple VM's, at least some of those VM's w/ multiple HDD's, and multiple proxies w/ multiple CPU's. I would expect to see a fairly high amount of concurrency, but when watching the number of connections at my repositories (EMC and Dell dedupe devices) I only see around 2-5 concurrent connections. I would have expected to see around 10-15+ easily, if not getting into scores of connections. I guess what I'm looking for here is clarification on what "Use multiple upload streams per job" actually means.
One other thing I've noticed is that historically backups become extremely unstable w/ lot of failures seemingly due to perceived network issues w/ this feature enabled. Clear the check mark box and stability improves. The description does indicate "high-latency links" which we do not have; locally we're 10Gb inside our VMware environment, 10Gb to our Dell DR4100's, and 2x 1Gb each to our Data Domains. Two matching data centers, w/ a 500Mb L2 link b/t them, but no Veeam backup data traverses. Only Veeam <-> Proxy commands; each data center backs up its data locally to itself. I didn't consider myself using a high amount of concurrent jobs, probably 5-6 concurrent jobs, each w/ around 3-10 VM's. If this is truly meant for slow network links and/or 5-6 jobs running concurrently is a large amount of concurrent jobs, then that's fine, I get it. I've since staggered my jobs anyways, but still just don't use this feature since it seems to cause more problems than get a network guy excited about efficient use of network capacity.
That's V8 and/or older. V9, I understand it's going to make great use of multiple data streams; again if I misunderstand the above please let me know. Otherwise I find myself a bit confused we would tout this as a new feature if it's been there in the last 1-2 major revisions. I'm trying to find notes from the Veeam folks that give details on exactly what we mean by "multiple data streams" as we speak. Anyways, do we have some good details on exactly what and how V9 is going to use multiple data streams please? I need this both for my excitement as well as seriously network planning. I'm gathering up a couple of switches and old SAN's to insert as my first landing zone/staging area for backups and need to plan accordingly. Multiple data streams are the name of the game for LAG so if we will have capability to run a LOT of traffic down the highway, then we'll need as many lanes as we can throw at it to back up our modest 33TB or so data every weekend. I'm sure larger places have a lot more.
Being somewhat a networking guy, I got excited when I saw "use multiple upload streams per job," which I took to mean "multiple data streams," of which LAG can make great use. But the description throws me a bit; does it actually not mean multiple data streams? When checked, and running concurrent jobs, each job w/ multiple VM's, at least some of those VM's w/ multiple HDD's, and multiple proxies w/ multiple CPU's. I would expect to see a fairly high amount of concurrency, but when watching the number of connections at my repositories (EMC and Dell dedupe devices) I only see around 2-5 concurrent connections. I would have expected to see around 10-15+ easily, if not getting into scores of connections. I guess what I'm looking for here is clarification on what "Use multiple upload streams per job" actually means.
One other thing I've noticed is that historically backups become extremely unstable w/ lot of failures seemingly due to perceived network issues w/ this feature enabled. Clear the check mark box and stability improves. The description does indicate "high-latency links" which we do not have; locally we're 10Gb inside our VMware environment, 10Gb to our Dell DR4100's, and 2x 1Gb each to our Data Domains. Two matching data centers, w/ a 500Mb L2 link b/t them, but no Veeam backup data traverses. Only Veeam <-> Proxy commands; each data center backs up its data locally to itself. I didn't consider myself using a high amount of concurrent jobs, probably 5-6 concurrent jobs, each w/ around 3-10 VM's. If this is truly meant for slow network links and/or 5-6 jobs running concurrently is a large amount of concurrent jobs, then that's fine, I get it. I've since staggered my jobs anyways, but still just don't use this feature since it seems to cause more problems than get a network guy excited about efficient use of network capacity.
That's V8 and/or older. V9, I understand it's going to make great use of multiple data streams; again if I misunderstand the above please let me know. Otherwise I find myself a bit confused we would tout this as a new feature if it's been there in the last 1-2 major revisions. I'm trying to find notes from the Veeam folks that give details on exactly what we mean by "multiple data streams" as we speak. Anyways, do we have some good details on exactly what and how V9 is going to use multiple data streams please? I need this both for my excitement as well as seriously network planning. I'm gathering up a couple of switches and old SAN's to insert as my first landing zone/staging area for backups and need to plan accordingly. Multiple data streams are the name of the game for LAG so if we will have capability to run a LOT of traffic down the highway, then we'll need as many lanes as we can throw at it to back up our modest 33TB or so data every weekend. I'm sure larger places have a lot more.
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
-
- Veteran
- Posts: 635
- Liked: 174 times
- Joined: Jun 18, 2012 8:58 pm
- Full Name: Alan Bolte
- Contact:
Re: Could we talk mutliple streams for a moment please?
Current implementation (v8) is only for traffic between Veeam data movers. Each job can only have one data stream to your deduplication device because it's writing a single file. In v9, the option for per-VM backup chains will allow you to write separate files for each VM in the job; because the job is writing multiple files to the deduplication device at the same time, it can use multiple streams.
-
- Veteran
- Posts: 354
- Liked: 73 times
- Joined: Jun 30, 2015 6:06 pm
- Contact:
Re: Could we talk mutliple streams for a moment please?
Thanks Alan, excellent point about a given job only writing one file at a time! My grand assumption was per proxy CPU, each meant a single data stream, so I never could figure out why (4) proxies each w/ (4) CPU's all running concurrently never gave (16) concurrent streams. I'm looking very forward to the per-VM chain featuer of v9. So how will the proxies work w/ a given multi-HDD VM at that point w/ per-VM chain? Currently a given mutli-HDD VM will get different proxies assigned to it, will they each stream their respective HDD(s) into the same file concurrently?
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
-
- Veteran
- Posts: 635
- Liked: 174 times
- Joined: Jun 18, 2012 8:58 pm
- Full Name: Alan Bolte
- Contact:
Re: Could we talk mutliple streams for a moment please?
Proxies are not important to the question of writing files to a SMB share on a Dell DR device (not sure what EMC device you're using, but the same is true of any SMB share or DataDomain Boost). Rather, all proxies send their data to the gateway (part of the repository settings), and the gateway writes files to the storage device. However, if you choose not to specify a gateway in the repository settings, it's common for one of the proxy servers you're using to be dynamically assigned the role.
-
- Veteran
- Posts: 354
- Liked: 73 times
- Joined: Jun 30, 2015 6:06 pm
- Contact:
Re: Could we talk mutliple streams for a moment please?
Something else I'd like to understand better, the Gateway server. I've looked around before but never found much on what it is or does. Currently we have it set to auto on all our repositories. Will the Gateway server's functionality change in v9? If we have say, (4) proxies in a given data center, and job concurrency, will they all just choose a single one to be the gateway server? Or can/will they each become a Gateway server? Our old EMC DD's are just CIFS share (SMB). No DD Boost needed in our environment. Old 620's and a 2200 per data center. CIFS shares on our Dell DR4100's as well. No OST or anything fancy.
I found this article floating around, any corroboration? http://www.virtualtothecore.com/en/veea ... up-chains/
I found this article floating around, any corroboration? http://www.virtualtothecore.com/en/veea ... up-chains/
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
-
- Chief Product Officer
- Posts: 31806
- Liked: 7300 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Could we talk mutliple streams for a moment please?
Gateway server is the server that runs the target data mover in cases when it cannot be run on the storage device itself (as is the case with backup repositories based on Windows or Linux servers with internal or DAS storage).
-
- Enthusiast
- Posts: 75
- Liked: 3 times
- Joined: Jun 16, 2010 8:16 pm
- Full Name: Monroe
- Contact:
Re: Could we talk mutliple streams for a moment please?
In v9, will this feature "per-VM backup chains" also be used with Backup Copy Jobs?
-
- Product Manager
- Posts: 20406
- Liked: 2298 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Could we talk mutliple streams for a moment please?
As mentioned in the referenced blog post, it's repository option, not a job one. Thanks.
-
- Enthusiast
- Posts: 75
- Liked: 3 times
- Joined: Jun 16, 2010 8:16 pm
- Full Name: Monroe
- Contact:
Re: Could we talk mutliple streams for a moment please?
So the answer would be "yes" as long as the backup copy jobs are on repositories with this option? It looks like this would be a way to have multiple VM's processing at the same time with Backup Copy Jobs. I know that this has been requested in the past a few times. Nice.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Could we talk mutliple streams for a moment please?
Yes, your understanding is correct.
-
- Veteran
- Posts: 354
- Liked: 73 times
- Joined: Jun 30, 2015 6:06 pm
- Contact:
Re: Could we talk mutliple streams for a moment please?
So how would the Gateway server come into place for basic CIFS shares then? Bear w/ my incessant questions please, I'm just trying to gain a thorough understanding of the underlying architecture.
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
-
- Chief Product Officer
- Posts: 31806
- Liked: 7300 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Could we talk mutliple streams for a moment please?
Well, to be honest it's really not a "explain me in two words in a forum post" sort of topic. Proper explanation requires diagrams and takes about an hour of our VMCE class.
If you are naturally interested, then you should first read our documentation to understand Veeam architecture, components and data flow... the above is impossible to grasp without knowing the underlying architecture. And I can assure you that as soon as you learn that, you will not need me to answer your question above at all
If you are naturally interested, then you should first read our documentation to understand Veeam architecture, components and data flow... the above is impossible to grasp without knowing the underlying architecture. And I can assure you that as soon as you learn that, you will not need me to answer your question above at all
-
- Veteran
- Posts: 361
- Liked: 109 times
- Joined: Dec 28, 2012 5:20 pm
- Full Name: Guido Meijers
- Contact:
Re: Could we talk mutliple streams for a moment please?
Anyway, should you use a windows repository, do not use "CIFS" (which is the wrong word anyway) but select the server, then the disk, Veeam will take care of the "share" and is much faster.
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: Could we talk mutliple streams for a moment please?
The like is for the CIFS bashing, appreciated
It's SMB!!!
It's SMB!!!
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Veteran
- Posts: 354
- Liked: 73 times
- Joined: Jun 30, 2015 6:06 pm
- Contact:
Re: Could we talk mutliple streams for a moment please?
If I understand the attempt at correction(?), I also understand Windows servers to indeed be SMB, and Dell/EMC dedupe devices are Linux-based (being not windows), and refer to their local shares as CIFS protocol during setup of the shares (<--also their nomenclature). I just took their references and ran w/ it. I've also found during some test comparison that setting up a repository as a shared Windows (SMB) folder vs. just adding it as a server to point to its drive locally, does indeed run a bit faster. Now if we can just get tape to run at speed across the network! But perhaps we get a bit off-topic here, albeit great advice regarding repositories on Windows machines. <insert thumbs-up smiley here>
As to "explain to me in two words topic," well no of course not! I'm not looking for a two-word discussion on multiple data streams in regards to concurrent backups, link aggregation, more detail on what some features mean and what they're doing under the hood, etc. I have read over the architecture (didn't find much on the gateway server at the time) but have since downloaded some more documentation I do need to read over. In parallel w/ that, I also like to have human conversations to help dig deeper and understand.
As to "explain to me in two words topic," well no of course not! I'm not looking for a two-word discussion on multiple data streams in regards to concurrent backups, link aggregation, more detail on what some features mean and what they're doing under the hood, etc. I have read over the architecture (didn't find much on the gateway server at the time) but have since downloaded some more documentation I do need to read over. In parallel w/ that, I also like to have human conversations to help dig deeper and understand.
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)
Who is online
Users browsing this forum: Bing [Bot] and 237 guests