by tsightler » Tue May 22, 2012 12:26 pm 2 people like this post
I'd say I don't typically see NFS targets, but they can be by using a Linux repository. I actually like that approach, but most clients are more comfortable with Windows, and this more common is to see CIFS or NTFS attached to Windows.
It really boils down to what the customer is wanting to do and what infrastructure they have in place. If the client is using a dedupe appliance (probably the majority) then it's either CIFS directly from multiple proxies, or NFS via a Linux repository. If a client is simply going to disk (probably the next biggest group) then SAN attached disk or locally attachted disk on a system used as a repository is the next best method.
From a performance/cost perspective I personally prefer using dedicated physical servers as proxies with locally attached disks for repositories. This provides a self contained device with a single maintenance contract with a fixed performance ceiling that can be easily defined. When you need more storage you add another repository so you get more processing horses as well. Potentially these can be proxies as well so you effectively get SAN offload and scale forever simply by adding dedicated proxy nodes. The disadvantage of this approach is of course that it requires manual balancing of jobs across the available storage/proxies, somewhat negating the smart load balancing built into the V6 product. That being said, from a scale out performance perspective, it's hard architecture to beat since it guarantees that traffic does not cross the network (direct SAN to local disk).
Many large clients have attempted to build single massive repositories using SAN attached disk. This simplifies job management since there is simply one massive pool of target storage, but has significant performance side effects as all of the I/O is targeted at a single large pool of disks it only takes a few reverse incremental jobs to have long request queues and cause tremendous I/O latency, significantly degrading backup performance. The lure of this "single repository" is strong, but is much more difficult to build at scale with reasonably random I/O performance, especially because they are generally attempting to use low end SAN hardware to do so (small caches that are easy to saturate).
So in other words, just as you said, it varies a lot from site to site, customer to customer, based on their goals and budget.