Too many issues experienced with B&R

JPMS · Post by **JPMS** » Nov 02, 2019 6:28 pm this post

This post has been born out of frustration with the number of issues we have had with B&R having purchased it in March 2019 (after running a 30 day trial).

To put the issues into context, we run B&R in a small simple environment. One Hyper-V host with about 15 VMs backing up (forward incremental with synthetic fulls) to a Linux repo with secondary backup to a single tape drive. Weekly Health Check and Surebackup.

I’ve spent 35 years maintaining Windows servers and networks. I’m the sort of person who uses technical support as the very last option after using any other available resources to understand and fix an issue myself.

In the first six months of using B&R I have had to open four technical support cases as a result of experiencing significant issues with the product (I have another issue that needs resolving but it is comparatively minor and I don’t have the time or energy to pursue atm!)

#03470315 – Surebackup (Opened March 19, 2019, Closed June 14, 2019, workaround in place awaiting fixes to be incorporated in v10)
I initially thought this was a problem with configuring our Surebackup lab but actually turned out it was an issue with the Veeam agent – “Exception of type 'Veeam.Backup.AgentProvider.AgentClosedException' was thrown.”

Spent an enormous amount of time on this because the agent couldn’t reproduce the problem. Initially we had to build a ‘proper’ Linux repo (at the time we were using a Synology NAS which is unsupported) to show the issue was still present and reproducable. Because the agent couldn’t reproduce the issue I was convinced that it was something in our environment and spent many days checking logs, drivers, changing hardware and even completely reinstalling operating systems (including the host OS).

Found a workaround myself and it was as a result of this the agent found that the issue was Hyper-V specific and they had been trying to reproduce the problem in a VMWare environment. To say I wasn’t very happy about the time I had wasted (because the agent hadn’t tried to reproduce the issue in the same environment as ours) would be an understatement!

Once the issue was reproduced it was referred to R&D. To quote the agent, “Turned out there is more to our SSH connections than just 1 fix. There are quite a few changes planned for the future and those will be rolled out over multiple updates. The plan as of now is to have them all ready by the time Update 5 comes around.” Presumably, this now means v10

#03535461 – Tape Backup (Closed, workaround in place, no satisfactory resolution)

Following advice on your forum that Active Fulls were not necessary with B&R (veeam-backup-replication-f2/the-need-fo ... 13-60.html) we changed our backups to incremental with synthetic full. After making this change the secondary tape backup 'failed' with a warning - "Backup file xxxxxxxx2019-04-27T021539_F718.vbk will be excluded from the list of files to backup because it is unavailable"

After working on the ticket for a while, ended up finding two workarounds for this myself. The second being the same workaround as for ticket #03470315.

Mentioned this to the agent working on #03470315 and he forwarded the details of #03535461 to the R&D team but I have no idea of the outcome.

The agent on this ticket responded, “As far as the issue related to this case is now over and your tape jobs are running as expected, I suggest to archive this case. Further investigation will be processed within the other case on a higher tier and the assigned engineer will find the root cause and identify the best solution for you.”

I agreed that the case could be closed but although the workaround for the two cases was the same I have no idea whether the cause was the same and what if anything is being done to properly resolve the issue.

#03743171 - Backups keep failing with "Failed to send command Error: An existing connection was forcibly closed by the remote host" (Opened Sept. 1, 2019, Problem identified, unhappy with resolution)

We had an opportunity to repurpose our (unsupported) Synology NAS so decided to replace it with a (supported) Centos based Linux repository. We then started getting intermittent failures of our incremental backups (about one in three) during the creation of the Synthetic Full.

From my time spent on the earlier Surebackup issue I was aware that there are two SSH libraries built into B&R but I had checked and since changing to the Linux repo had been unable to get the newer Renci library to work. We agreed that should be resolved first and in the end I found the solution myself. Renci uses SFTP for file transfer, not SCP (which the older Granados uses). As this wasn’t installed on the repo Renci wouldn’t work. SFTP wasn’t listed as a requirement for a Linux repo in your KB https://www.veeam.com/kb2216. I suggested they updated it and this has since been done.

Unfortunately, using the Renci library didn’t resolve the issue so we got the stage of producing Wireshark dumps to identify the issue. It was when I apologised to the Agent about the time (and therefore the dumps) on the Linux repo being slightly out (1-2 minutes) that they realised this may be (and indeed was) the issue. I had originally setup and tested NTP on the repo but on further investigation discovered that it wasn’t starting up when the server was restarted (and our repo is only on for three hours a night and so restarts daily).

I now have a resolution but am not entirely satisfied. The agent wants to close the case but there doesn’t seem to be any explanation available as to why a small time difference causes this problem, why does it only happen intermittently, what is a ‘satisfactory’ time difference, is this a bug or is time synchronisation a requirement, has this been referred to R&D? Also, going back to your KB article https://www.veeam.com/kb2216 there is no mention of NTP or the need for time synchronisation.

Furthermore, the results of this issue are that the incremental chain is damaged to the point that no further backups can be added to it requiring an Active Full to be made to continue backing up (it seems restores can still be made but I have only tried a couple of small file restores and not tested this properly). With this in mind, if the time synchronisation is a requirement (rather than a bug) then B&R should check that the systems are time synchronised before starting a backup and at least flag a warning and ideally should prevent the backup running entirely as it results in a damaged chain.

I won’t have the problem again because I know the cause and solution but how many other people are going to waste time with it as an issue?

#03806448 – “Attempt to read past the end of the SSH data stream” when running Active Fulls (Opened Oct. 10, 2019, referred to R&D on that date, “They still working on the issue, however they have no solution yet.”)

Because of the problem above, I have had to repeatedly run Active Fulls. As a result of this I have discovered that I intermittently get the above issue. This is not due to time synchronisation issues as it has occurred since resolving the NTP issue on the repo.

After examining the Wireshark dumps the issue was referred to R&D. I don’t know what they found but we have had no resolution yet.

In summary, I have had the product for six months using it in a small simple environment. In that time I have had four separate significant issues (three preventing me from backing up).
#03470315 – bugs in SSH in Hyper-V environment.
#03535461 – possibly related to #03470315, possibly separate bug. Will it be fixed?
#03743171 – bug or requirement? If a requirement, appears to be undocumented and not enforced by B&R software.
#03806448 – based on initial response, appears to be a bug.

Don’t get me wrong, I really like the product (when its working). The technical support is always approachable, helpful and friendly. The forums are invaluable (I spend quite a bit of free time just browsing them to learn more about the product).

However, at a conservative estimate, I have spent at least a month troubleshooting these issues, that’s $20,000 of billable time or to look at it in a different way, a month of my life I won’t get back! I know software is never bug free and there will always be a requirement to spend time sorting out issues but this seems to be disproportionately high for a mature product that is deployed in a small simple environment.

I read about your new developments that will never be relevant to me and just sigh and think to myself “I just want what I have already got to work properly”! The impression I get (it may not be true) is that there may be a reluctance for your tech support agents to involve R&D with an issue and as long as there is a working resolution to an issue that that is the end to the matter, rather then getting an issue properly resolved. This isn’t good for the product or the use of agents time (because unresolved issues keep coming up) and most importantly (

) it isn’t good for my time because I’m wasting time on issues that shouldn’t be there!

Post by **Gostev** » Nov 02, 2019 8:39 pm this post

JPMS wrote: ↑Nov 02, 2019 6:28 pmThe impression I get (it may not be true) is that there may be a reluctance for your tech support agents to involve R&D with an issue and as long as there is a working resolution to an issue that that is the end to the matter, rather then getting an issue properly resolved.

Hi, Jason. Your impression is not incorrect - indeed, I too see this happening more often than I'd like. And we in product management and R&D are equally frustrated every time it happens, because this does not give us a chance to improve the product and address real issues in a timely manner. Luckily, such events are still more of exceptions than the rule.

And this is exactly why we always ask users to escalate support cases which get "stuck" at lower support tiers for longer than necessary via "Talk to a Manager" functionality on the support portal. Also, it is important not to accept solutions which are merely workarounds (such as "recreate the job") because this leaves the actual bug unfixed, and keep all support cases open at least until you hear back that the bug is confirmed by R&D and scheduled for this or that release vehicle. Finally, it is always a good idea to raise the visibility of issues like the above ones with the forum posts, just like you did - as we can't fix problems we don't know about.

Anyway, from what you explained above, it sounds like all those bugs did end up in the R&D eventually - but I understand it took too long for this to happen, and so the bug fixes missed the previous update release vehicles. I will ask support management to assign someone to double check that all bugs associated with your support cases ended up in our back tracking system, and are scheduled for one of the upcoming releases.

Other than that, of course I'm very sorry to hear you had bad luck with our product in your environment so far - this is certainly not a typical experience. But we will make this right.

Thanks!

zadrian · Post by **zadrian** » Nov 04, 2019 3:17 am this post

When people mention hyper-v , I would usually ask....hyper-v server or server 2012/2016/2019 with hyper-v ? Believe it or not....Hyper-V is not "server with hyper-v" !!!

There does not seem to be so much issues with VMware ESXi (with vCenter). I actually dropped hyper-v altogether and got VMware Essentials Plus (3 servers) as I believe that it is useless running production servers & VMs without a backup solution in place.
...
Then I do realise that maybe it is time to retire tapes as logically there are many changes in Veeam as compared to conventional backups....like usage in reverse increments (why forward increment if you backup to tape ?)....but the issues I had was to perform the backup copy of the "synthetic full" to the tape....but we resorted to using remote-site-NAS (in the future cloud as well) and retired tape.

JPMS · Post by **JPMS** » Nov 04, 2019 3:27 pm this post

Thanks for the response zadrian but this isn't the issue here.

VMWare may, or may not, be better than Windows Hyper-V. Tape may, or may not, have a place in modern backups. The point is that both of these are supported options in B&R and so we have an expectation that they will perform well.

I have found both Windows Hyper-V and tape extremely reliable in the environments we use them and they fit our requirements well. We have yet to have the same exoerience with B&R.

JPMS · Nov 04, 2019 3:55 pm

Gostev wrote: ↑Nov 02, 2019 8:39 pm Hi, Jason. Your impression is not incorrect - indeed, I too see this happening more often than I'd like. And we in product management and R&D are equally frustrated every time when it happens, because this does not give us a chance to improve the product and address real issues in a timely manner. Luckily, such events are still more of exceptions than the rule.

Do you know why this is? I was mulling this over while walking the dogs today and could think of several posibilities but I'm sure you are in a much better position to judge!

I took your advice in case #03743171 and insisted that the agent refer the case to R&D for investigation of the cause and a fix. Following this, I have been punished by the Gods for my impudance - we believed we had a working workaround but this has now failed so I'm back to square one. With this in mind I have also escalated the case.

One area I feel does need improvement. I was told in case #03470315 that if I wanted to know when fixes for an issue were included in an update/release I would have to check the release notes (which even then may not be detailed enough to identify a specific fix). If this explanation is correct, why isn't there an automatic notification to ticket holders that there issue is fixed when an update/release is released?

Cheers, Jason

Post by **Gostev** » Nov 06, 2019 7:43 pm this post

JPMS wrote: ↑Nov 04, 2019 3:55 pmDo you know why this is? I was mulling this over while walking the dogs today and could think of several posibilities but I'm sure you are in a much better position to judge!

This is simply because some support engineers are less experienced than others. Some have been working at Veeam for many years, and some have only joined earlier this year. And you can't pick which one you get for each particular case

I actually started as a support engineer myself in this group of companies, so I know first hand that it's not abnormal to hesitate escalating into R&D something that may potentially appear as a basic misconfiguration in the end - making you look stupid in front of everyone.

I also saw other, much more experienced (than me) engineers joining the team - but bringing practices from their previous job, where R&D was not to be talked to. Even such experienced guys needed some time to adjust to our way.

All we need is your (our customers) help to flag such situations - so that support management is aware of the issue, review each situation separately, and take am appropriate corrective action to make sure the particular engineer will not make the same mistake again. This is quite normal learning process - and as I've said in my first response, such situations are more of exceptions than the rule anyway. For example, among all support cases you've listed in your original post, only 03535461 was not handled by the support engineer correctly (and this is being investigated by support management now) - while all other appear to have been appropriately escalated into the R&D.

Post by **Gostev** » Nov 06, 2019 10:57 pm this post

JPMS wrote: ↑Nov 04, 2019 3:55 pmIf this explanation is correct, why isn't there an automatic notification to ticket holders that there issue is fixed when an update/release is released?

I actually believe there is, if you keep the ticket open. But I'm not with support, so I may be incorrect - this is just based on me seeing reports where thousands of support cases get closed in the next few weeks after each update release, so clearly they must be reaching out to everyone with open tickets in case an update fixes the issue.

JPMS · Nov 07, 2019 12:48 am

Gostev wrote: ↑Nov 06, 2019 7:43 pm I actually believe there is, if you keep the ticket open. But I'm not with support, so I may be incorrect - this is just based on me seeing reports where thousands of support cases get closed in the next few weeks after each update release, so clearly they must be reaching out to everyone with open tickets in case an update fixes the issue.

This has not been my experience, both #03470315 and #03535461 have been closed and are still awaiting fixes (but I do have a working wokaround). In both cases I was asked if it was OK to close the case but if I knew keeping them open would result in notifcation that fixes had been made then I would have asked to keep them open. In #03470315 the engineer did say I could open a new case once v10 was released to see if fixes had been incorporated in that release, which I will do but it seems a bit of an inefficient way of dealing with it.

I'd like to thank you for your time responding to my post. One of the standout qualities of Veeam are its support forums and the quality input from Veeam staff (and Veeam users too). I never understand why some companies put so little resource (if any) into their forums - as a new user I have learnt so much here that can only have reduced the need for seeking direct individual tech support. A win for both myself and Veeam!

mcz · Nov 11, 2019 7:38 am

Gostev wrote: ↑Nov 06, 2019 7:43 pm All we need is your (our customers) help to flag such situations - so that support management is aware of the issue, review each situation separately, and take am appropriate corrective action to make sure the particular engineer will not make the same mistake again. This is quite normal learning process - and as I've said in my first response, such situations are more of exceptions than the rule anyway.

...just wannted to confirm Anton's words. I've worked on 100 tickets with veeam and only in 1-2 cases the engineer could have done better. I reported it to the management and told them that this engineer needs some help and they did what the promised because 2 years later I had the same engineer again and this time everything was fine. Most of the time, the experience with the russian engineers is outstanding!

Nov 11, 2019 12:17 pm

Thanks for your kind words, Michael. All of our support engineers are pretty awesome, regardless of their location! Your time zone just does give you a chance to interact with folks from many of our other support offices much - but I keep hearing this from customers on every continent.

So, I just wanted to give all support offices a shout out for all their hard work providing 24/7/365 coverage for Veeam customers worldwide:
• Beijing, China
• Bucharest, Romania
• Columbus, U.S. (the biggest one)
• Prague, Czech Republic
• St.Petersburg, Russia
• Sydney, Australia
• Tokyo, Japan
• Vancouver, Canada

JPMS · Post by **JPMS** » Nov 11, 2019 2:24 pm this post

I would agree.

This post was primarily about the number of issues I had experienced with the product not the support. There have been a couple of missteps along the way but generally the support has been excellent. Certainly the best I have received from any company after 35 years of designing and maintaining systems.

ejenner · Post by **ejenner** » Nov 11, 2019 4:17 pm this post

I think if you're not happy with Veeam you can go and use one of the other products which has all the same capabilities....

I have spent a lot of time working for a backup company which used to be a market leader. I'd say the thing to bear in mind is the extent of what a general purpose backup product has to be able to do.

If you think of the OS as supporting itself and offering the ability to run applications... backup is a lot like that. It runs itself and enables you to backup the OS and lots of applications. It does not do that for only one OS either, but many.

You see how compatible it has to be with how many different OS'es and applications... then you can see why not everything works perfectly all the time.

What makes all the difference is the kind of support you can get when there are problems. Many vendors will put gatekeeper resellers in the path blocking access to the knowledge you want from the vendor to resolve your issue. Or if you take Microsoft as an example they charge per incident for technical support and even then you'll struggle to get to talk to somebody senior enough to have your problem resolved.

I've got my fingers crossed that V10 will be able to backup SQL from a CSV...

thomas.biesmans · Nov 12, 2019 7:51 am

I agree with Jason's remark that these are supported options and should work, but my additional two cents: Hyper-V and Linux repos sounds like it had to be cheap. Something I see quite often is that what you gain there in cost, is so incredibly often made up for in additional - (un)expected - service costs. Even our own Hyper-V engineers don't like Hyper-V, that's just our reality / experience.

Linux repos sound nice, but would I ever suggest them over a standard Windows server? No, or worst case with some heavy caveats or increased work estimates. It's unfortunate that you had to encounter the teething issues. They're annoying, ideally they're not even there at all, as Veeam really does a goed job releasing properly working software.

Post by **Gostev** » Nov 13, 2019 9:12 am this post

While what Thomas posted may look like a typical holy war post

he's actually spot on, at least in this particular case. Devs finished digging the issue with unreliable behavior of CentOS-based backup repository (case 03743171), and it appears to be caused by OpenSSH server defaults in CentOS. The solution is to uncomment and bump MaxSessions value in sshd_config file to 50 or 100 connections.

Post by **GregorS** » Nov 13, 2019 10:49 pm this post

Synology unsupported... I've got a couple of them as backup repositories via iSCSI... rock solid!

JPMS · Post by **JPMS** » Nov 13, 2019 10:54 pm this post

thomas.biesmans wrote: ↑Nov 12, 2019 7:51 am Hyper-V and Linux repos sounds like it had to be cheap. Something I see quite often is that what you gain there in cost, is so incredibly often made up for in additional - (un)expected - service costs.

I think you are projecting your circumstances and requirements on to other people. Clearly if you have 'hyper-v engineers' you are working in much larger, more complicated environments than we (a two person company supporting small businesses) are. I am our hyper-v engineer, I am also our Windows Server and Workstation OS engineer, Exchange Server engineer, Office365 engineer, Cisco engineer, Linux engineer (apprentice level), Support engineer, 2nd Tier Support Engineer, Switch it off and switch it back on engineer. In addition, I run the company and do the accounts. I even have to make my own coffee!

The solution chosen was not because it was cheap, it was because it fulfilled out requirements and mostly utilised our existing experience, knowledge and skill set. Your Hyper-V engineers may not like Hyper-V but we have been using it since Server 2012R2 and for our (and our customers) very simple requirements has proved to be extremely reliable. It would therefore make no sense, technically or commercially, to look at other hypervisors that would require learning from scratch and that we have no experience with.

JPMS · Post by **JPMS** » Nov 13, 2019 11:17 pm this post

GregorS wrote: ↑Nov 13, 2019 10:49 pm Synology unsupported... I've got a couple of them as backup repositories via iSCSI... rock solid!

The irony is we used to use a Synology NAS and also had no problems with it.

Initially we used it just via SMB shares, then we used it as a Linux repository. We had an opportunity to repurpose it and, because it wasn't a supported solution, changed to a 'proper', supported, Linux repository and that's when we ran into all these issues!

JPMS · Post by **JPMS** » Nov 14, 2019 12:55 am this post

Gostev wrote: ↑Nov 13, 2019 9:12 am While what Thomas posted may look like a typical holy war post he's actually spot on, at least in this particular case. Devs finished digging the issue with unreliable behavior of CentOS-based backup repository (case 03743171), and it appears to be caused by OpenSSH server defaults in CentOS. The solution is to uncomment and bump MaxSessions value in sshd_config file to 50 or 100 connections.

Sorry Anton but I believe you may be mistaken on several points here.

I believe you have referenced the wrong case number. #03743171 is still ongoing (I have another session tomorrow) and in any case I already had MaxSessions set to 100 before this problem first appeared.

I think you are referring to #03470315 or #03535461 both of which are 'fixed' by changing MaxSettings. Firstly, your wording suggests that the Devs found this solution when in fact, after many, many hours of working with your engineer on this, it was I that found this solution (as documented in the case notes). It was at this point that the engineer investigated further and discovered that this was not required in a VMWare host, only a Hyper-V host, so appeared not to be an OpenSSH issue but an issue with the Hyper-V version of your software/SSH libraries. Indeed, after reporting this to the Devs, before closing the case, the engineer reported back that "Turned out there is more to our SSH connections than just 1 fix. There are quite a few changes planned for the future and those will be rolled out over multiple updates. The plan as of now is to have them all ready by the time Update 5 comes around. Specifically for the issue you've experienced, RND is considering on adding additional registry tweaks to control the way Veeam communicates with Linux servers. I don't know how many of those changes will be major enough to make it into release notes though." Consequntly I am not sure if what you said was mistaken or that it reflects work done on the issue since I closed the ticket.

Furthermore, I would point out the problem isn't CentOS based, it's OpenSSH based. The default MaxSessions for OpenSSH is 10 (I've downloaded the source code and checked). That means any Linux distribution using OpenSSH (which I believe is pretty universal) will have a default of 10 unless specifically changed. As such, if Devs have now decided that this isn't a software issue but a requirement to have more sessions in a Hyper-V environment then why is it not documented in your Linux Repo Requirements documentation (https://www.veeam.com/kb2216)? I did suggest this was added to the requirements but it wasn't taken up.

And while on the subject of this document; While working on #03743171 I noticed that my repo wouldn't work with the newer (better) Renci SSH library and was falling back to the older SSH library (B&R has two SSH libraries available to it). I worked with the engineer on this and discovered myself that Renci uses SFTP for file transfer rarther than SCP. Again, at the time, SFTP was not listed in the Linux Repo Requirements as a requirement for Linux repos. It has subsequently been added at my suggestion.

I am the first to admit when I have made a mistake but quite frankly I resent the suggestion that I have brought this upon myself by choosing a 'cheap' solution. I have outlined in other responses the reasons for the choices I made and carefully selected a supported solution. All the issues I have experienced have either been the result of bugs in B&R or inadequate Veeam documentation - these are not my fault!

Post by **Gostev** » Nov 14, 2019 1:55 pm this post

Yes, I may have copied wrong support case ID, sorry. I'm on the road this week, so I only had a brief phone chat exchange on this with the support lead - which didn't include any extensive details like you provided above. All I know is that the root cause for the SSH connections issue was understood.

The main point of my post was totally different though. I simply wanted to note that the issue is indeed Linux configuration specific, which is also not uncommon for us to see in support particularly due to how fragmented the Linux ecosystem is, with so many different distributions and component versions (making it impossible to provide comprehensive testing coverage that is as extensive as for Windows). By the way, this is the reason why we're limiting our v10 XFS integration testing (and official support) to a single Linux distribution for a start - we learned on our mistakes on trying to support "a" Linux.

And there's another factor to consider beyond Veeam's own QC. Linux-based repositories are in general a few times less popular than Windows-based ones, so the amount of "field testing" performed thus far by the end users is also much lower. Meaning, many more of previously undiscovered issues were found by users (and fixed by Veeam) in the past years in Windows repositories comparing to Linux repositories, which had fewer existing landmines removed.

So, just as Thomas said, there are indeed added risks and time costs of "going cheap" and using Linux repositories to consider, due to them being less popular and thus relatively less polished by the field testing than Windows repositories. At the same time, you are also much more likely to run into issues like the above ones that no customers have ran into before just due to the sheer number of different Linux distros, OpenSSH server versions, varying default settings, etc. - what Thomas called "teething issues" of specific configurations.

And in fact, you even ended your first post with the words that totally prove the point made by Thomas:

JPMS wrote: ↑Nov 02, 2019 6:28 pmat a conservative estimate, I have spent at least a month troubleshooting these issues, that’s $20,000 of billable time or to look at it in a different way, a month of my life I won’t get back!

Just as I said, he was really spot on with his comment - I could not have said it better... this is Linux!

JPMS · Post by **JPMS** » Nov 14, 2019 4:50 pm this post

Thanks for the response Anton. Funnily enough I came to the same conclusion/realisation about Windows v Linux repos in the last couple of days and decided that the best solution for me may be to wipe the repo and set it up as a Windows repo instead.

Basically I have probably made the wrong choice based on a lack of knowledge about B&R and its installed base. Partly it was a natural progression, Synology NAS SMB -> Synology NAS Linux Repo -> 'Proper' Linux Repo. I also thought that generally speaking (leaving aside B&R) that Linux would be better suited to a simple task like running a repo - I've never come across a NAS that runs Windows as its OS! I'm a bit surprised that Windows repos are much more popular but I can fully appreciate that this makes them better supported/tested and therefore would be a better choice.

My point is that I wasn't going for a "cheap option" - the server is not some repurposed old server but a brand new server with the same spec as our Windows servers. The cost of a Windows OS is neither here nor there in the long wrong. Indeed, I have had to spend time learning more about Linux to create the repo but felt it was worth making that investment as I thought it would be the best option, not the "cheap option". In that, it appears I may be mistaken.

I will create a new post about this. There are a few things I want to clarify regarding Windows repos and it would be useful to get other input regarding Linux v Windows repos before I make a decision to change.

Post by **RubinCompServ** » Nov 14, 2019 5:10 pm this post

@JPMS,

For what it's worth, we use Linux to "front-end" NFS-mounted volumes and we have only had one major issue (the OpenSSH issue mentioned above, well before the KB article was written), but have had excellent performance otherwise. Our next storage won't support NFS so we're going to use ISCSI directly to Windows, and we'll see how that goes.

daniel.farrelly · Nov 18, 2019 4:14 pm

We use linux based backup repos connected via iscsi and routinely saturate 10G nics during weekly fulls.

Post by **RubinCompServ** » Nov 18, 2019 9:25 pm this post

@daniel.farrelly

How much data are you trying to push when you saturate your 10G links? Our Linux repos are connected via NFS, but I can't imagine that iscsi has so much more overhead than nfs.

davow · Post by **davow** » Nov 19, 2019 12:42 am this post

I use CentOS 7 as my repos. Never had an issue. If Veeam stop supporting them, I will move on to Nakivo. Had plenty of issues with (Virtual) Windows repo especially with their Fast Clone technology which was pushed so hard by Veeam yet was continually breaking (Microsofts fault). I suppose it is a lot more reliable these days though.

davow · Post by **davow** » Nov 19, 2019 12:44 am this post

@Gostev

Hi Anton,
With reference to "this is the reason why we're limiting our v10 XFS integration testing (and official support) to a single Linux distribution for a start".
Can I assume you are referring to RedHat/CentOS v8 for the XFS integration? I searched but did not find any specific mention apart from this thread.

thanks

Post by **Gostev** » Nov 19, 2019 1:34 am this post

Hello - no, we decided to go with Ubuntu 18.04 LTS for a start, which is a bit more popular among our user base. Thanks!

JPMS · Post by **JPMS** » Nov 19, 2019 8:54 am this post

I've been giving this all a fair bit of thought since my last post and am currently sticking with my current Centos repo and continuing to work with the engineer on resolving the outstanding issues. I have workarounds for all but the last ticket, which appears to be intermittent anyway.

Moving to a Windows repo is attractive. As Anton has pointed out, it is more popular with Veeam users and therefore better 'tested'. My experince is nearly all with Windows (from v1). REFS seems to offer some attractive benefits for B&R users...and then reality kicks in! I've been following veeam-backup-replication-f2/windows-201 ... 6-120.html and tape-f29/slow-tape-job-performance-t61054.html and just sigh and feel a little bit depressed, particularly when you learn that Microsoft are unlikely to backport all the REFS fixes to the Server 2019 LTSB.

I'm also interested that Ubuntu is more popular with your users. I chose Centos because I was advised by people that work in the Linux world that it was 'boring and stable', i.e. slow to adopt new features/technology but more reliable because of that. I really don't want to start a 'what's the best version of Linux' flame war, but it's interesting that you seem to be saying that your primary criteria is what's popular and that technical considerations possibly come second.

As I said, at this stage I'm inclined to stick with 'the devil I know'. I will probably wait until v10 is available and see what advantages there are for the different platforms and reevaluate then.

Post by **Gostev** » Nov 19, 2019 2:16 pm this post

In general, Ubuntu/Debian and RedHat/CentOS have about the same popularity among Veeam users, with a slight advantage of Ubuntu/Debian. And interestingly enough, together they represent 85% of Linux machines in use by our customers - with all other distros sharing the remaining 15%.

Please note however that v10 XFS integration is sort of a special case, because XFS block cloning is available in the most recent Linux distributions only, among which Ubuntu 18.04 LTS has a good lead - which is why we decided to focus QC on that one as a first step. Other potentially compatible distributions like RetHat/CentOS 8 will have experimental support status with the initial v10 release.

Nov 19, 2019 3:44 pm

Ubuntu is also reasonably "slow and boring" as long as you stay with LTS (long term support) releases. Perhaps not quite so "slow and boring" as RHEL/CentOS, but the LTS releases have 5 years of standard support and 10 years before official EOL. It's also quite easy to roll from one LTS release to another with a simple (in most cases) upgrade procedure. Now the non-LTS releases of Ubuntu, those are a different beast and typically only have a 9 month lifespan with the latest a greatest stuff, and all the breakage that entails. I don't really consider the non-LTS releases useful for cases where stability is prized, and repos certainly fit that latter case, IMO.

As Gostev mentioned, in this case supporting Ubuntu 18.04 first made a ton of sense because it's been out over 18 months, while RHEL8, which is the first version to include a kernel new enough to support XFS block clone, was just released in May 2019 and CentOS 8 followed in late September, so just a couple of months ago. Because of this, there's far more Ubuntu 18.04 in the field at this point vs RHEL/CentOS8 and Ubuntu 18.04 is already on it's third update cycle.

That being said, I don't anticipate issues with RHEL/CentOS8, at least not issues unique to that platform. Ubuntu 18.04 uses kernel 4.15 by default (optional 5.x kernel is available with 18.04.3), while RHEL/CentOS 8 includes 4.18, not a huge difference overall, but testing and field deployments are the only way that stability can be proven. I've been running CentOS 8 through it's paces in my personal lab with no issues to this point, but my testing is nothing compared to what our QA will do, and more importantly, what the real world will throw at it!

R&D Forums

Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Re: Too many issues experienced with B&R

Who is online