Comprehensive data protection for all workloads
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the » 5 people like this post

Dear Anton,
First of all, I really enjoy reading your digest. It's become a monday morning ritual for me and I thank you for that ! It's a pleasure and a great example for others in the industry to follow.

Every now and then a thing comes up in your digest, regarding the use of SMB share targets and/or using inexpensive NAS boxes in general. We all know how you feel about that, and I understand that your "feeling" is not merely a feeling, but something that was learned through Veeam support incidents. You're doing a great job of keeping us aware that this is risky. But you're never really explaining the why of it, or the things behind it.

I highly value and respect your opinions, as they come from real experience. But, like you, I'm an engineer-type of person. So on one hand, we don't just go for authority, we also want/need/are programmed to understand the reasoning for something and the details underneath, before we accept and go with it. So that's one part. On the other hand, many of us still have no choice but to use the "cheap NAS" solution for backups + SMB targets.

Yes, iSCSI is available but it comes with it's own set of issues:
  • a share is "shareable", while iSCSI target is not (technically, the target itself is shareable if you also you use a special file system, but we're talking small business customers here, that ones that use a cheap NAS in the first place)
  • many setups have more than 1 host to protect, so we have 1 proxy on each host. Means we have many proxies. Which asks for careful LUN capacity planning on the NAS, as you can't just share a LUN between proxies, so we get a problem with space efficiency. With SMB there is just one share and all space is shared between proxies. It's a network file protocol, after all, and this seems like a use case that it was designed for. From a higher point of view, it seems much more logical to use a network file protocol than iSCSI here.
  • last, but not least, when we have 1 proxy on each host and one host dies, the recovery process includes connecting that LUN to different proxy on another host - leading to a risk of connecting to a LUN from more than 1 location, due to human mishap. For those that don't know what this means: it's kind of like connecting a SATA cable from one disk to two computers in parallel. In practice, it means instant corruption of your backup repository, precisely the one you need right now to perform a restore. That's risky ! With SMB that simply can't happen. And backup is all about lowering risks, so this reason alone is enough to scare people away from iSCSI.
So "just use iSCSI" may sound like a good solution, but in practice it does introduces other problems and potentially also some very dangerous risks. Please fell free correct me if I missed something in above reasoning, maybe there's something I don't know, but that's how I see it with my current knowledge.

Besides telling us that it's "risky" to use SMB and cheap NAS devices and that is "not recommended", that it's the "no.1 case of failed restores in our support", it would also be great if you could provide some more details and/or some statistics as to why this is bad. What exactly are the risks with using SMB (without CA) ? Some of us won't be able to get rid of cheap NAS + SMB anytime soon, and we need to be able to gauge that risk and understand it more properly as we have to live with it.

Also, Veeam has a thing called "Storage-level corruption guard". Is our data on SMB-connected repository 100% safe, after backup files health check has been performed on it ? Or not. And if not, why not. The devil is always in the details. But you didn't really provide much (or any) details around this, that would help us better understand and gauge this risk that we have to live with. You also noted that more than a fourth of your customers use that combination, proving what other community members are saying - that this setup is quite common. It would benefit everyone to better understand, in much more details, why using cheap NAS + SMB is a bad idea. Where are the problems. And how bad it really is. I personally didn't hit a failed restore, I just have your information that I'm "risking it" but I have no idea how much of a risk this is.

When time comes for a backup subsystem upgrade at each of our customers, I'll try to push for a normal server with Linux or Windows on it and use that as a backup repository. It has other advantages, besides making me feel safer as per your suggestions. But you have to understand that for some customers it's simply not going to happen, so this problem is not going away for many of us. I understand also that Veeam v10 will support NFS and that's going to be safe. I'd also like to know more details about why NFS will be safe and SMB is not safe, as they are basically doing the same thing. Don't get me wrong, I'm far from being an MS fan - I just happen to notice that MS says it's ok to host VM disks over SMB share (granted, it needs to be highly available) but it implies the protocol itself is safe. So where exactly is the problem with SMB, why is it not ok for backup. Is it only in "cheap NAS" implementation of it ? Is that where the main issue is ?

With kind regards,
David
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Or, to put a more concrete question:

My setups, where I use an SMB share on a NAS (mostly QNAP in my case) look like this:
- hosts are either vSphere or Hyper-V
- hosts use local storage (DAS) for VM-s. We have no setups with SAN, it's all small customers
- there's a NAS for backup (either QNAP or Synology)
- the NAS only has 1 SMB share. The share is either connected directly to the host (if there's just 1 host to backup up) or through a separate VLAN, isolated from the rest of the network (the more common case, where a customer has 2 or more hosts to protect). The share is not accessible via corp LAN and it's also password protected
- Veeam backup server is usually installed in a dedicated VM (in case of vSphere) and in most cases it's a VM running Win 10 Pro. Or, in case of a Hyper-V host, it it could be installed on the host directly.
- in case customer has more hosts, then each of the hosts also have a dedicated VM (running Win 10 Pro) to use as a backup proxy. Proxies are the only machines on the host to be connected to the backup VLAN, the only ones that access the SMB share. Proxies are not used for anything else and are not directly accessible from corp LAN (only RDP if allowed and only from certain machines on the network, firewalled through external router).
- backup job is configured to do Active fulls weekly, during the week it does incremental backups
- we are in GB-TB range here (incrementals are always in the GB range, fulls are no more than a a few TB max)

I intentionally avoid doing stuff like reverse incremental and synthetic fulls and do an Active Full weekly. I keep anywhere between 15-30 days history, do that includes 3-5 independent backup chains. This is all to minimize the risk of a failed restore, in case something goes wrong with "cheap NAS" + SMB share combination.

Now, since I'm not doing Reverse Incrementals and I'm not doing Forever Incrementals and I'm not doing Synthetic Fulls, then (as far as I understand it) Veeam B&R will never actually touch these backup files again, once they are copied to the NAS. Right ? And if I also use "Storage-level corruption guard" - then how unsafe is this setup ? What can go wrong, besides the NAS itself being considered "a cheap box" that can possibly corrupt things spontaneously. But besides that, let's be real. What are the real risks of not being able to restore and can I except to to go badly in such setup. Can something really go unnoticed here.

Yes, I'd stil like to know more details, even in Reverse incremental and Sythetic full scenarios - but the above scenario is what mostly I use, so some feedback safety of such setup would be really helpful.

With kind regards,
David
TheWaterbug
Enthusiast
Posts: 29
Liked: 2 times
Joined: Dec 06, 2019 7:29 pm
Full Name: Steven Kan
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by TheWaterbug »

Also, Gostev's email has this parenthetical: "(no write-through support)," but then I googled the following two articles:

https://winaero.com/blog/enable-write-t ... indows-10/

https://techcommunity.microsoft.com/t5/ ... a-p/868341

Or do those two articles mean something different? I admit I had no idea what "write-through" meant until this morning . . . .
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Write-though just means that an IO operation is guaranteed to be completed (data actually put on the drives, not just received in cache) before the operation returns. There are many layers in the IO subsystem chain where things get buffered. From bottom up:
- the HDD/SSD internal cache
- the RAID controller cache
- the filesystem cache
- the higher protocol/application cache (SMB/NFS)

"Everyone caches, even you CPU" is something I read somewhere and it's very true. It's done for performance reasons.

Proper write-through support means that we can instruct an IO operation to complete synchronously and we have a guarantee that once the IO operation is completed, it's 100% sure that it's written on disk (or that we can safely assume that it's going to be written).
In practice it's actually a bit more complex than forcing it to disk, as SSD drives can "cheat" but have condensers (like Intel DC series) to guarantee successful flushing of entire buffer even if power is suddenly lost, RAID controllers will normally also "cheat" on this, but they have BBWC (battery backup write cache) so data in buffers is still safe in case power is lost. So, it's acceptable to cheat sometimes, when you have proper measures in place and in reality writes can still get buffered, but that's another topic.

So, back to SMB itself. The links you provided inform us that in recent versions on Windows 10 and Windows Server we can instruct SMB to use write-through caching, meaning not to use cache at all. But that doesn't tell us anything about the SMB implementation in QNAP and Synology. Their SMB implementation also needs to support this, for this to work.

But, write cache is a problem mainly in case when the device holding the buffer crashes or loses power while writing. I'm not sure how big a risk we have with a NAS that's backed up by UPS-es (and sometimes even aggregate power). None of my NAS boxes are ever used without an UPS that can hold at least an hour. In all cases where NAS is used for backup, my servers will be shut down long before that.

Also I don't see this write cache thing as being SMB specific. If memory serves, NFS also has a flag to enable synchronous writes. Who can guarantee its enabled on a NAS by default. I believe iSCSI target implementations also have a write-back cache option, at least on Linux. But NFS and iSCSI is considered safe (or safer) by Veeam. Why? Unsafe write cache can't be the only argument, that thing exists in NFS and iSCSI implementations too. It all depends on how it's configured. That's why I said "the devil is in the details".

I don't believe write cache alone is the reason they are seeing issues on SMB and not on iSCSI and NFS. But perhaps I'm wrong, I just ask for clarification. Only so far, no one from Veeam chimed in on this.
TheWaterbug
Enthusiast
Posts: 29
Liked: 2 times
Joined: Dec 06, 2019 7:29 pm
Full Name: Steven Kan
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by TheWaterbug » 1 person likes this post

Ah, I've read that email for a 4th time, and now I understand it a little better.

Gostev is recommending specifically against using "non-continuously available (CA) SMB shares as backup targets for reliability reasons (no write-through support)," not necessarily against all SMB targets. Is that correct?

Because I just built a Win10 box to be my remote repository, and I'm pretty darn sure it's using SMB. But is there a way I can force write-through, since I'm not in control of how the main Veeam box mounts the remote share?
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev » 1 person likes this post

Unlike with NFS client, you cannot force SMB client to do write-through with regular (non-CA) SMB shares, such as your Windows 10 box... that's the whole point. While for CA shares, SMB client will always do write-through (also not configurable).
lucius_the wrote: Dec 16, 2019 11:49 amBut you're never really explaining the why of it, or the things behind it.
One of my previous blogs about a couple of years ago was dedicated to the explanation of this issue. I will try to dig up in my emails and repost here when I'm at the computer. However, it's just a longer version of what comes down to "no write through support", so it is explaining pretty obvious stuff on how not physically landing writes on disk causes data loss with any sort of outage :D
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Thanks ! Looking forward to reading it. I don't think I followed your digest back then ;)

Although if it's just the write-through cache... On first I don't see it would cause problems in my setup (2nd post in the thread) as my NAS boxes are on UPS and don't crash. Btw, write-back cache exists for NFS and iSCSI too. But perhaps the problems when using SMB surface more often when Veeam is doing backup chain transformations ? I'm not sure if SMB guarantees writes and reads ordering in case there's write cache on the NAS side. I could accept that some problems (or maybe it's by design ?) may exist in SMB implementations in NAS boxes making it risky. I could also accept the similar problems exist in Microsoft implementations of it, that wouldn't be shocking. But that's a lot of guessing from my side. I'm hungry for details. Waiting for the blog copy...

In the mean time, would anyone from Veeam like to take a look at my setups (2nd post above) and share some thoughts ?
JPMS
Expert
Posts: 103
Liked: 31 times
Joined: Nov 02, 2019 6:19 pm
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by JPMS » 1 person likes this post

I thought Anton's digest would kick off some discussion about this!

I too would like more technical detail about this to assess the size of the risk (compared with other solutions). As lucius_the points out, caching is used in many parts of a backup solution and if you want to see throughput performance really die then try turning them all off.

From the digest, it seems that for non-CA shares SMB decides that something has been successfully written once it has been sent, rather than having any acknowledgement that the process has been successfully completed. If something gets lost in transit then SMB doesn't know it has happened and won't retry the event resulting in corrupt data. I understand that isn't ideal but how often is it a problem in reality? How often is silent corruption an issue? I'm not bothered by things like power failure (which are easy to mitigate) or NAS crashes, both of which are events you know about.

For me, the main lesson is to make sure you are verifying your backups with tools like Healthcheck and Surebackup which we do anyway. This is the only way to ensure you have a healthy backup, whatever technology you are using.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev »

Quite often unfortunately, as it has been by far the top reason of failed restores in our support. As a matter of fact, even a short network connectivity outage due to some network equipment rebooting at the wrong moment may cause data loss.
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

JPMS wrote: Dec 17, 2019 11:06 am If something gets lost in transit then SMB doesn't know it has happened and won't retry the event resulting in corrupt data.
Are you sure it doesn't know it happened ? The part where SMB protocol doesn't automatically retry is ok, that's for the upper layer to decide. But SMB definitely should report that the IO operation didn't succeed, I'd be very very surprised that it doesn't report that. I'd bet it does. I mean, what kind of a network file protocol would it be if it wouldn't report on IO operation status. BUT - in all fairness, that's just my guess that it does report, I never really went digging into the protocol itself. If you know more, please let me know (I'm not kidding, I'm very interested getting to and understanding the core of the issue).
Gostev wrote: As a matter of fact, even a short network connectivity outage due to some network equipment rebooting at the wrong moment may cause data loss.
I'm quite shocked to read this. And puzzled even more now.

AFAIK, each IO operation should either return a success or a failure (or not return at all - which should, after a timeout has elapsed, be treated as failure). But if an IO operation finished with failure the application above will know about it - sooner or later. Is is then up to the application above, the one doing the IO, to deal with it properly. Retry, undo, retry again from start (or whatever) and if it still doesn't work, signal back that the whole operation failed. I assume you are doing that, of course.

Are you saying that SMB (in non write-through mode) returns success for an IO operation before it even hit the SMB server end ? Or that it behaves differently in terms of reliability of returned IO operation status ? Because only that can explain the above quote, around corruption caused by some network glitches. But that's kind of a strong statement. It would imply a somewhat serious bug in the design of the protocol itself. Maybe I'm wrong to assume SMB returns the status of an IO operation back properly. But I do very much assume that it does return them reliably and properly. Is that where the core issue really is ? Please clarify.

I will try to find some time today, to dig out somewhere how and when SMB returns IO status results. If the operation really is confirmed while data is still on the wire, then... Well. I can understand your insomnia and a wish to get rid of SMB entirely. Man, this is interesting. I'll search when I have some more time.

One more thing. I don't see how it matters whether we're using write-back or write-through modes. With write-back caching the IO operation result will just come back to the upper layer sooner than with write-through caching. What's done about the returned status is what matters. Maybe the ordering of those IO operation results play some important role here, too. I understand it get can tricky in more complex IO scenarios, like doing writes to multiple files in parallel. When doing stuff like backup tranformations, coupled with async IO operations on multiple files in parallel perhaps - yeah, I see how that could get tricky. One needs to take care not to queue too much IO in advance, as one IO operation might depend on another IO returning success first, in order for the whole process to be safe. But if something goes wrong, then maybe (just maybe) it's not actually SMB's fault - provided (!) that SMB does in fact provide the results of all IO operations reliably (which, at this point I still don't know for sure).

I may have allowed my guesswork to go a little too far. So I apologize and I'll stop here, before it becomes unhealthy.
But I'd really love to hear more details around this issue with SMB. I fully accept that in combination with SMB there is an issue, I'm just still at this point not convinced the issue comes from SMB. Cached writes or not. Because so far, the parts I have in front of me, they are just not clicking in place properly. So again - more details, please :)

You've been open before, you go into details around technical stuff in your digests, including bugs and issues and features in software, even in cases when it's your own. And I think that's a great thing. If that weren't so, I wouldn't even bother bringing this up for discussion. So, I do expect a bit more. So far you've been... vague. You know, you actually provoked this by bringing this issue up regularly, until I finally took it seriously now and decided to dig some more, because details are missing. I only do this because I'm concerned for the backups we manage for ourselves and others. If what we're doing is not safe I have understand why, because... just telling me it's unsafe is not good enough.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev » 1 person likes this post

lucius_the wrote: Dec 17, 2019 5:02 pmAre you saying that SMB (in non write-through mode) returns success for an IO operation before it even hit the SMB server end?
Yes, this. For example, in our research of this issue, we were able to "successfully" write a few GB of data to a non-CA SMB share AFTER physically disconnecting it from the network :D

lucius_the wrote: Dec 17, 2019 5:02 pmIf the operation really is confirmed while data is still on the wire, then... Well. I can understand your insomnia and a wish to get rid of SMB entirely.
Ditto.
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

What ??
I obviously made too many assumptions in above post. I always tell my colleagues the old "assumption is the mother of all fu**ups" and then I fail right on it. Lovely.

Did you report this to MS? I wonder. Was it some developers were optimizing for performance and got carried away, or is this yet another feature here by design? Can you share with us?

I tried to find something on the net about this, but nothing came up. Just a bunch of cookbooks and news stories around async io. Not a word about this.

Questions:
Will health checks detect such corruption ? I guess it would, but doesn't hurt to ask to be sure.

I understand in v10 you added your own NFS client to the package. Will you be adding it to the free Veeam Agent as well?

For Veeam B&R I could switch to iSCSI. This will make my life more complicated, but what you just confirmed is scary enough.

I wonder if I can mount NFS from Win10 Pro client to a drive letter. I use Win10 Pro for proxies. Is that a viable option? To mount an NFS share in Win10 and use a drive letter where it's mounted as a backup repository (with Veeam B&R 9.5u4b) ? I guess Veeam would treat it as a local repository then. Not sure if that's a good idea.

Well, I have a lot to digest now.

In any case, THANK YOU for confirming where the issue is.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev » 1 person likes this post

Sure, Microsoft is fully aware, as it is indeed be design of SMB stack: the feature is called SMB Cache. Their recommendation is to use CA shares in scenarios when reliability is important. Plus, based on our discussions they added the new NET USE paratemeter in Windows Server 2019 for interactively mapped shares (such as when you map a share to a drive letter), which are however unusable by backup software since it is running as a service, while mappings are in the user context.

Sure, health check will detect parts of backup file completely missing :D

Veeam Agents 4.0 will support NFS target when backing up to a Veeam repository.

Windows NFS client is never a good idea from performance perspective :) but I'm also not sure it uses write through, since it is not a default option.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev » 1 person likes this post

So I've managed to dig up is my 2 years old "word" with the details on this issue, as promised:
Gostev wrote:If you saw my Data Corruption 101 or Backup Repository Best Practices breakout sessions at past VeeamONs, you know we don't recommend SMB backup targets due to them being the top source of backup corruption cases that we see in support. But, what's exactly the issue here? I had to explain this to someone last week, so I thought this information would be valuable for many, especially since this issue may affect any application.

So, here's what is causing the issue. When an application writes data to an SMB share using WinAPI, it receives Success for this I/O operation prematurely - right after the corresponding data gets put into the Microsoft SMB client's sending queue. If subsequently the connection to a share gets lost – then the queue will remain in the memory, and SMB client will wait for the share to become available to try and finish writing that cached data. However, if a connection to the share does not restore in a timely manner, the content of the queue will be lost forever. Apparently, there is even the corresponding event logged into the system event log when this happens, but I was unable to track one down quickly. As a result, application thinks that the data was written to a disk, when in reality it was not. In their stress tests, our QC folks saw up to 3GB of data lost due to this issue by comparing the data that our data mover thought was successfully written into the backup file to what actually landed on the storage backing this share. And perhaps even scarier was seeing WinAPI keep returning Success on writes even AFTER the share was already made unavailable.

However, this does not automatically make every SMB share a bad candidate for hosting backups. The issue above was actually one of the reasons why Microsoft introduced SMB Transparent Failover in Windows Server 2012. Basically, when using file shares with the Continuous Availability (CA) flag set, the Microsoft SMB client will not return Success until the I/O has landed on a stable media on the server side. And as an added benefit, active SMB handles will survive a failover to another node and the application may only experience a temporary stall in IO – but no timeouts or broken handles. One of our customers who is using such backup repository did a "chaos monkeys" exercise, resetting cluster nodes while Veeam backup jobs were running – and could not achieve any data loss or even backup job interruption no matter what. He was very impressed. So if you need a highly available backup repository, this could be a way to go - although this setup does require Windows Failover Clustering, of course.
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the » 1 person likes this post

Thank you for all the details provided ! This is very useful and interesting info ! And not just for my backups.

So in summary, this is what I learned so far:
  • SMB client returns success for IO operations before data even reaches the SMB server
    -> basically all Linux/FreeBSD-based SMB share backup repositories are affected by this, including the popular "cheap NAS" category (like QNAP, Synology, FreeNAS setups, etc)
  • the problem is entirely in the Windows SMB client implementation
    -> this means that the implementation details on the SMB server side (like in a QNAP/Synology NAS, for instance) doesn't actually play a role in this particular issue, as it's a client side feature
  • the SMB client behaves differently when using file shares with the Continuous Availability (CA) flag set
    -> for this flag to be set one needs to have Failover Clustering on Windows Servers with SMB Transparent Failover (https://blogs.technet.microsoft.com/cla ... available/) configured to provide such SMB share. When such share is used as an SMB backup repository, it's safe. It also requires SMB version 3 on the client (Windows 8/Server 2012 or newer).
    -> most small shops won't have this as an option, Linux-based NAS boxes don't have that option
Possible workarounds ?
  • this SMB client behavior could be modified with a mount option - possibly making it behave more nicely
    -> but there are currently no such SMB access options available in scenarios used by backup software
  • Linux-based SMB server implementations could, at some point, decide to start cheating and provide a server configuration option to enable pushing the "CA" flag to SMB clients
    -> whether this could work reliably depends on details around what else is assumed when CA flag is set (another topic entirely). At the moment, samba doesn't seem to even have such option available in smb.conf. But it seems that some vendor already did such a hack and is using it in production: https://lists.samba.org/archive/samba/2 ... 02549.html. Seems like it didn't get merged and is not available to just anyone. Also, very few details in the post. But maybe in future samba versions, an option like this comes up, changing the current situation.
Conclusions to bring home, for the time being:
  • For Veeam B&R: stay away from SMB. Switch to iSCSI until v10 brings support for NFS.
  • For Veeam Agent (free): if using SMB, a regular health-check is pretty much required.
    As for NFS share support in the coming Agent versions, we'll see if Veeam decides to fish for more Veeam Agent licences ;) Anton said "Veeam Agents 4.0 will support NFS target when backing up to a Veeam repository" which sounds more like the free version of the Agent will not be able to backup to an NFS share. Pity maybe, but not tragic. Veeam has given us a lot (for free) including health checks.
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Uhm,
I just found many QNAP models on Veeam site listed as certified, showing a nice "Veeam Ready - Repository" sticker.
Clicked the first model in the list, to read more: https://www.veeam.com/kb2436

It literally says: "SMB3 used in testing environment".
No other protocol is mentioned. No mention of any issues and not a hint recommending iSCSI or suggesting that using SMB is unreliable.

Ok, I get the part where the box "works with all Veeam functions" and QNAP wants to be on the list, fine. But you've put a "Veeam certified" sticker on it (ok, Veeam Ready/Veeam<whatever>, it doesn't really matter). This basically assures customers that using the box is fine/safe, with SMB specifically, as that's what's tested. :cry: :?

Interesting. So here's Veeam recommending these boxes by listing them on your site, claiming they work (tested with SMB even), while in your digests you're doing exactly the opposite explaining how that's NOT a recommended backup repository and loosing sleep over it, knowing that people use it with Veeam :) Funny, this :D you got to admit
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev »

You're only missing the fact that this article is from 2 years ago, which is a bit before the issue became big enough to force its deep research and our following discussions with Microsoft. So, you will be able to find thousands of other SMB mentions made during the first 10 years of Veeam, when its peculiarities were not yet known.

Also, keep in mind that Veeam Ready validates feature compatibility and performance levels only - and for these specific tests, protocol doesn't matter that much, as it all comes down to interface and IOPS. In other words, the fact that performance testing was performed on certain protocol doesn't automatically mean it is the recommended protocol for the given device. While the recommendation to use SMB for CA shares only is placed directly into the Veeam UI.

In any case, while it's certainly unrealistic to go back and update 10 years worth of content, I will ask our Alliances leadership to schedule update of key articles for key NAS vendors, adding the explicit recommendation to use iSCSI or NFS protocol for those NAS devices which do not provide the ability to create CA shares (for example, NetApp does have such a feature).
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev »

lucius_the wrote: Dec 18, 2019 5:57 pmConclusions to bring home, for the time being:
  • For Veeam B&R: stay away from SMB. Switch to iSCSI until v10 brings support for NFS.
Actually, NFS support was possible since v1: you just need to mount it to a Linux server, and register a Linux-based repository. This has been a fairly popular approach among our customers. v10 merely streamlines it, removing that extra Linux server from the picture - which is super important for SMB, where relatively few admins have sufficient Linux expertise to manage such a solution.

BTW, equally popular solution that you didn't mention is to configure NAS itself as a Linux repository server - in which case you're removing all protocols from the picture, because Veeam data mover now runs directly on the storage box (which also improves performance).
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Thank you for the updates.

I understand "Veeam Ready" is not the same level of certification like the one you do for SAN integrations, for instance. But it sends a similar message. Knowing what I just learned, I just wanted to point that you have misleading information on your site (it's in your interest to correct it, and someone would have reacted sooner or later anyway). Didn't mean to be rude - it may have turned out that way, though. Gotta to be more careful in forum discussions, it's not a live conversation and you guys don't know me, so... my apologies.

I know about the Linux gateway option for NFS, but that asks for yet another VM to manage, besides the Windows VM for proxy (on a side note, it would be super cool if there were a Veeam proxy for Linux). But ok, true. It's another option.

I went though a few posts about putting Veeam components on the NAS actually, not long ago, but it didn't... look too good. I didn't like the idea of setting everything up (in a manual, unsupported way) and then see that a NAS update can easily break it. Some people couldn't get it to work afterwards. Though I was reading for Synology and I haven't actually tried it and I don't know how often an update breaks it. But still, it didn't look good.
That being said, I will read what you linked above, I have a spare QNAP in one location, where I can try this and see what I get. It does look like something I'd be most happy to use, if it doesn't get broken with a NAS update.
On another side note, it would be super cool if you could package it and make it available in the NAS stores, like some other software (I've seen Nakivo - not that I'm comparing it to Veeam, I'm just saying it'd super cool if Veeam components could also be installed in a single click as opposed to doing it manually).
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Don't know what I was reading before, in your link it seems that all I need in the NAS is Perl, nothing more. Will definitely try this.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Gostev » 1 person likes this post

I've separated the further exchange on using QNAP as a Linux repository into the dedicated thread, since it totally hijacked this discussion :)
Chord
Lurker
Posts: 1
Liked: 1 time
Joined: Dec 12, 2019 9:40 pm
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Chord » 1 person likes this post

Great discussion. Very relevant to my latest round of research.

I think I've retained this all correctly but I'm hoping to get some clarification on a few points after reading through this thread.
-SMB client caching is the problem
-SMB client caching CANNOT be disabled without seeing the CA flag set from the share
-NFS is a viable alternative (Better support for native write through operations?)
Am I understanding these correctly?

On a side note. Last year I encountered a need for a centralized, high performance, high throughput, and CA SMB storage solution and landed on the MS clustering option. I was very skeptical about the overhead in configuring and maintaining the environment given the added complexity across the whole stack. There are a number of different implementation options each with their own caveats, which if ignored or overlooked can be very damaging to data integrity or availability. Despite the extra work that went into developing the solution, it really did seem like the best option. While my specific implementation isn't relevant to backups, the failover capabilities really are very impressive. While testing we had multiple client systems writing over 400MBps. We were able to "pull the plug" on the "primary" cluster node with no detectable data loss or reduction in performance and the applications writing to storage didn't skip a beat. (Note that when I say primary node, I really mean that we simulated a primary node with the SMB witness client).

As pleased as I am with the solution, it really is an enterprise level deployment and certainly not feasible for small businesses or teams with limited personnel or training. Additionally when it comes to backups I personally like to keep things as simple as possible. I don't really think a CA setup fits that mold for me. I suppose it'd be an effective landing zone for a primary repository if you had a large environment with many Veeam proxies/gateways that all needed to write simultaneously. Even still, clusters are not impervious to failure and I have seen situations where ALL nodes in a cluster have failed. Every environment is different and it'll depend on your comfort with the technology and how confident you feel in your ability to maintain it properly. Based on my familiarity with Veeam documentation, support, and pre-sales discussions I believe they prefer a horizontal scale-out versus a vertical one. That could inevitably be a bigger win if you ever run into trouble and need help from Veeam support.

Thanks,
R
Novox
Expert
Posts: 128
Liked: 22 times
Joined: Jul 12, 2016 12:51 pm
Location: Vermont, U.S.A.
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Novox »

So it sounds like the guidance is, don't use SMB, ever... (even with CA, Windows SMB client implementation will immediately return success).

Is NFS simply better because the v10 native NFS implementation (and the NFS protocol itself) will both support NOT RETURNING SUCCESS until the data is actually written to the backing store (write-through).

If this is the case, how much of a performance hit should we expect to see if we convert all backups to write-through vs. write-back? Shall I assume major?

Finally, has Veeam considered writing a proxy "app/package" for Synology or QNap? If so, the network file protocol shouldn't matter, AND this would give Veeam the ability to add in ransomware protection. (i.e. since it's not SMB, or NFS, but a proprietary Veeam transport protocol, Veeam should be able to block ransomware "oh, I see you're connected to something over SMB, let me encrypt it all for you" opportunities...

Thank you.
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by soncscy » 1 person likes this post

but a proprietary Veeam transport protocol,
Short reaction: No.

Long Reaction: Noooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

One of the reasons we choose Veeam in our shop and for our clients is because they typically just use off-the-shelf and known tech. If I want to know why something with SSH fails, I can just go look at the SSH.NET code and understand it immediately. If I want to understand why my clients get an NFS error, I can just look up standard NFS errors.

Proprietary protocols are just plain bad. The small gain you get in terms of performance is offset by having to be 100% reliant on the vendor to understand what's going on in your environment. I realize why people might want it, but I honestly believe it's just an exercise in futility for nominal gains at best. If, and this is a big if, there was a guarantee of a custom protocol offering 10x+ the performance, I'd consider it, but I would be exceptionally wary of it.

Our shop has this exact position on most dedupe appliances/software since you surrender your control over your data as soon as you end up using such proprietary protocols.

Keep it simple -- a bunch of disks, known and explorable code, no vendor lock-in.
Novox
Expert
Posts: 128
Liked: 22 times
Joined: Jul 12, 2016 12:51 pm
Location: Vermont, U.S.A.
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Novox » 1 person likes this post

In general, soncscy, I completely agree that proprietary is usually bad. However, that said, non-proprietary, or standard is basically how ransomware works, by taking advantage of known standards. "I see you have an SMB connection to a remote resource, here, let me destroy that for you."

I only meant to suggest the transport protocol could perhaps be proprietary to Veeam, in order to build in ransomware protection and correct issues with SMB falsely indicating "write success" when that may be far from the case. I would not want the backing store to change (ext4, ZFS, BTRFS, etc...), nor would I want Veeam to remove SMB or NFS (because if the backing store doesn't change, then restore operations (or really all operations) could still be performed via SMB, NFS, etc.)

It was just a suggestion since nearly a quarter of all Veeam users use a NAS, likely from Synology or QNap (though I know there are many other vendors like Buffalo, Drobo, etc).

Thank you
Seve CH
Enthusiast
Posts: 67
Liked: 29 times
Joined: May 09, 2016 2:34 pm
Full Name: JM Severino
Location: Switzerland
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by Seve CH » 2 people like this post

Hello,

The write cache for SMB shares is leveraged by Opportunistic locking. Search on the internet for Oplocks (Opportunistinc locking) and why in some cases you must disable that or risk data corruption on file shares.

Due some use cases in our company, we have several file servers running with SMB1 and oplocks disabled (SMB 2, SMB 3 totally disabled because Oplocks is mandatory for them). I think it is not until W2019 that you can disable oplocks on SMB3. With oplocks, we were able to reproduce the behavior Mr Gostev reports: you were able to successfully write to a crashed Windows server.

Long time ago (+10 years), Oplocks with Samba file shares were a complete mess with MS Access files or shared MS Excel spreadsheets so yes, SMB shares are risky.

Oplocks can be disabled on both sides: server side or client side. But as you should already know, SMB1 is less secure than SMB3.

Regards
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

Valuable insight.

So if we can disable oplocks on Linux (most NAS boxes use Linux and it's Samba implementation) then we're safe ?
I found some more explanation here: https://www.oreilly.com/openbook/samba/ ... 05_05.html
and here: https://www.samba.org/samba/docs/old/Sa ... #id2615926

And I'm not really sure it's all about this. Seems to depend on client implementation (the one in Windows) a lot.
With Samba on Linux, this can be tuned. This should include NAS boxes then. Who knows, maybe it would make a difference. Needs a test.

Edit:
This source is more convincing: https://www.linuxtopia.org/online_books ... ng_07.html
nmdange
Veteran
Posts: 527
Liked: 142 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by nmdange » 1 person likes this post

In regards to that "proprietary protocol" comment, Veeam always uses a proprietary protocol to send data to the backup repository. The difference is with a Windows or Linux repository, it's one step from the backup proxy to the repository itself. With an SMB share, the gateway server sits in the middle.

Windows repository: https://helpcenter.veeam.com/docs/backu ... l?ver=95u4
Linux repository: https://helpcenter.veeam.com/docs/backu ... l?ver=95u4
SMB Share: https://helpcenter.veeam.com/docs/backu ... l?ver=95u4

If you look at what these low end "NAS" devices really are on the inside, it's literally just a cheap computer with a "proprietary" (usually Linux-based) OS. It's not like a higher end SAN storage unit that has redundant controllers. So why waste all this time dealing with issues like this instead of just buying a plain old server to use as your backup repository? If you use a Windows server you can also install Veeam on the same hardware, keeping your backup system totally separate from production.
lucius_the
Enthusiast
Posts: 58
Liked: 37 times
Joined: Jun 09, 2017 3:50 pm
Full Name: David
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by lucius_the »

I do agree, most entry-level NAS boxes are some Intel Atom CPU, small on RAM, no sight of ECC or any such reliability features, commonly found on real servers.
So why waste all this time dealing with issues like this instead of just buying a plain old server to use as your backup repository? If you use a Windows server you can also install Veeam on the same hardware, keeping your backup system totally separate from production
Uhm... costs ?

You need to consider there are clients that:
- have a single-server setup (meaning no backup machine to bring apps online when that server crashes)
- can't or don't want to invest any serious money on backup infrastructure (or even main production infrastructure, for that matter)
- can tolerate an outage lasting 1-2 days, that would happen maybe once in 5 years
- would also very much consider things like: noise, power consumption, extra cost of maintaining another server, physical space requirements, etc

For example: a small accounting company, 5 employees, using a single application with a database sitting on a small server, that just has an UPS for basic protection. How would you sell them a server just for backups. They won't even buy a Windows Server licence for the server - it's probably going to be a Win10 Pro machine acting as "a server" in this case. And you can probably forget about selling yet another Windows server licence for a backup server.

But they still need some good backup solution, data is no less important to them. Veeam at least provides a good recovery option for them, when that small single-server machine burns out. And it's (arguably) doable with just a cheap NAS and Veeam Agent - or Veeam B&R community edition, in case the server is virtualized. It's also fairly easy to recover: just buy a replacement server (or borrow them something until you sell them the new one) and restore the machine from backup. Bare metal recovery works with Veeam.

In mid-sized businesses (for me that's 100+ employees with computers), your reasoning is correct. They probably can't tolerate a couple of days outage and would consider a more proper solution for backups. But there's a lot of smaller companies, probably more than the mid-sized or big-sized ones, that benefit or could benefit from Veeam offerings, although mostly the free ones. But as they grow chances are that they would keep Veeam software and start buying licences, so why not make it work for them while they are small too.

A good (and reliable) solution when using a NAS in SMB would come quite handy. Mostly for access protection, as it adds some safety net: SMB share can be password protected and left to be accessible only from Veeam backup software. So it does offer some protection in case the server itself becomes infected by ransomware (or gets infected by a dumb admin) and with iSCSI there is no such protection. So that's why I see SMB as a good option. It's a pity that it has reliability issues. If there is a way to fix/circumvent these SMB issues, it might positively affect a whole niche of users.

Although... It just came to me that in these "small shop setups" one can use iSCSI + snapshots on the NAS to achieve repository protection. Even a bit stronger level of protection than SMB account. The only drawbacks are that more storage is needed on the NAS and performance drops a bit on LUNs with snapshots. Well, it's another option: use iSCSI for repo access + snapshot that LUN to get some ransomware protection. Though I have no idea how safe a LUN snapshot is, if it happens to run while the backup is writing files ;)
socra
Novice
Posts: 5
Liked: never
Joined: Aug 18, 2015 10:45 am
Full Name: Socra
Contact:

Re: Gostev's digest from 16.12.2019 - SMB share repository on a cheap NAS

Post by socra »

lucius_the wrote: Dec 18, 2019 5:57 pm Thank you for all the details provided ! This is very useful and interesting info ! And not just for my backups.

So in summary, this is what I learned so far:
  • SMB client returns success for IO operations before data even reaches the SMB server
    -> basically all Linux/FreeBSD-based SMB share backup repositories are affected by this, including the popular "cheap NAS" category (like QNAP, Synology, FreeNAS setups, etc)
This is an excellent topic thanks for starting this @lucius_the, great content being shared here. I fully agree with the iSCSI conundrum. It might be safer but in times of stress I've also seen people erasing complete fileservers due to some pebkac lun magic.

Dunno if the statement about Linux/FreeBSD share backup repositories is correct though.
Seem to remember that NetApp based systems offers a flag "continuously available" when creating a SMB share and that is a FreeBSD based OS.
Post Reply

Who is online

Users browsing this forum: ante_704, Google [Bot] and 249 guests