- 
				tibus_alun
- Novice
- Posts: 9
- Liked: 2 times
- Joined: Mar 19, 2019 4:25 pm
- Full Name: AJ
- Location: Belfast
- Contact:
Linux storage repository woes
Case: 04217985 
Veeam B&R 10.0.0.4461
(Posting this in the general B&R section as while I do use vSphere, the issue seemed more to do with the Veeam storage servers itself)
Wanting to reduce Windows licensing and a preference for Linux servers over Windows, I recently decided to try out a Linux Veeam Repository now that such a thing has been supported for some time.
I moved one of our Veeam storage repo's (Dell R740xd, RAID10) from Windows 2019/ REFS to Ubuntu server 20.04/XFS.
All other variables were kept the same:
- Number of jobs (approx. 10 jobs, various Linux/Windows VMs on a VMware HA cluster backed by an EqualLogic ISCSI SAN).
- Backup method (Veeam Proxy based hotadd, a few Windows based VMs running on the same vsphere cluster).
- Backup settings (reverse incremental, same number of restore points etc.).
- Secondary copy jobs to a remote location.
Initially, all went well and jobs were working as before, the only notable difference was every job would take at least 10mins to "spin up", before data began processing. On the previous Windows storage repo, the "spin up" time was 2mins tops. I could live with an 8min delay for the jobs starting, so continued on.
A few days later I noted jobs began failing as they didn’t complete within the allocated window, which I found odd as the incremental backups only take an hour or so tops, and the scheduled window is 10 hours wide. Upon investigating, it seemed some of the jobs were sitting at "Waiting for backup infrastructure resources availability" and sometimes for many hours - it varied, a lot. This seemed to be random in nature, with the jobs starting after the 10min delay on some runs, while other runs wait for a few hours, and then really bad runs they would wait so long the job would breach the window and fail.
I opened a case with Veeam and provided all requested logs. Initially they pointed the finger at the second copy jobs and how these might be conflicting for server resources. I did point out that on the previous repo all schedules, settings and copy jobs were identical and never experienced a single instance of "Waiting for backup infrastructure resources availability" there. To rule this out though, I stopped all possible conflicts, removed copy jobs and ensured my main jobs ran 1 job at a time.
Same problem. Random jobs would hang for unpredictable lengths of time and before long I was unable to get our jobs finishing without a lot of rebooting, re-running jobs and a lot of manual intervention daily. I did send more logs into the case, but neither Veeam support nor I could find out what was going on. Rebooting the Linux repo would seem to help the jobs process normally again, albeit for a while. It seemed as if *something* was getting stuck and get worse over time. I did a full server firmware update in the hope it was some network driver/firmware thing, but no improvement.
Alas, we couldn’t live without these backups for any longer or with me spending so much time nursing them and so I formatted the storage repo back to Windows 2019 and the jobs have been running perfectly ever since (at least a month+). I was, unfortunately, unable to continue with the Veeam case until such time in the future when I can try all of this again.
tldr; my adventures into Linux Veeam Storage Repo didn’t work so well and I went back to Windows.
As I couldn’t progress the case with Veeam, I figured I would see if anyone else here has been through similar experiences or has any thoughts on what I might try again in future.
Thank,
Alun.
			
			
									
						
										
						Veeam B&R 10.0.0.4461
(Posting this in the general B&R section as while I do use vSphere, the issue seemed more to do with the Veeam storage servers itself)
Wanting to reduce Windows licensing and a preference for Linux servers over Windows, I recently decided to try out a Linux Veeam Repository now that such a thing has been supported for some time.
I moved one of our Veeam storage repo's (Dell R740xd, RAID10) from Windows 2019/ REFS to Ubuntu server 20.04/XFS.
All other variables were kept the same:
- Number of jobs (approx. 10 jobs, various Linux/Windows VMs on a VMware HA cluster backed by an EqualLogic ISCSI SAN).
- Backup method (Veeam Proxy based hotadd, a few Windows based VMs running on the same vsphere cluster).
- Backup settings (reverse incremental, same number of restore points etc.).
- Secondary copy jobs to a remote location.
Initially, all went well and jobs were working as before, the only notable difference was every job would take at least 10mins to "spin up", before data began processing. On the previous Windows storage repo, the "spin up" time was 2mins tops. I could live with an 8min delay for the jobs starting, so continued on.
A few days later I noted jobs began failing as they didn’t complete within the allocated window, which I found odd as the incremental backups only take an hour or so tops, and the scheduled window is 10 hours wide. Upon investigating, it seemed some of the jobs were sitting at "Waiting for backup infrastructure resources availability" and sometimes for many hours - it varied, a lot. This seemed to be random in nature, with the jobs starting after the 10min delay on some runs, while other runs wait for a few hours, and then really bad runs they would wait so long the job would breach the window and fail.
I opened a case with Veeam and provided all requested logs. Initially they pointed the finger at the second copy jobs and how these might be conflicting for server resources. I did point out that on the previous repo all schedules, settings and copy jobs were identical and never experienced a single instance of "Waiting for backup infrastructure resources availability" there. To rule this out though, I stopped all possible conflicts, removed copy jobs and ensured my main jobs ran 1 job at a time.
Same problem. Random jobs would hang for unpredictable lengths of time and before long I was unable to get our jobs finishing without a lot of rebooting, re-running jobs and a lot of manual intervention daily. I did send more logs into the case, but neither Veeam support nor I could find out what was going on. Rebooting the Linux repo would seem to help the jobs process normally again, albeit for a while. It seemed as if *something* was getting stuck and get worse over time. I did a full server firmware update in the hope it was some network driver/firmware thing, but no improvement.
Alas, we couldn’t live without these backups for any longer or with me spending so much time nursing them and so I formatted the storage repo back to Windows 2019 and the jobs have been running perfectly ever since (at least a month+). I was, unfortunately, unable to continue with the Veeam case until such time in the future when I can try all of this again.
tldr; my adventures into Linux Veeam Storage Repo didn’t work so well and I went back to Windows.
As I couldn’t progress the case with Veeam, I figured I would see if anyone else here has been through similar experiences or has any thoughts on what I might try again in future.
Thank,
Alun.
- 
				HannesK
- Product Manager
- Posts: 15598
- Liked: 3445 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Linux storage repository woes
Hello,
thanks for posting your experience with Linux repositories here. As you already switched back to Windows, I cannot really see how we could fix the root cause.
The normal way for such problems is to escalate the case after the "usual suspects" were checked (task limit, required software etc.).
In "normal" installations, a Linux repository behaves the very similar like a Windows repository. I know several customers that always have used Linux repositories with hundreds of TB or several PB each server. They work fine in general.
Best regards,
Hannes
			
			
									
						
										
						thanks for posting your experience with Linux repositories here. As you already switched back to Windows, I cannot really see how we could fix the root cause.
The normal way for such problems is to escalate the case after the "usual suspects" were checked (task limit, required software etc.).
In "normal" installations, a Linux repository behaves the very similar like a Windows repository. I know several customers that always have used Linux repositories with hundreds of TB or several PB each server. They work fine in general.
Best regards,
Hannes
- 
				Gostev
- Chief Product Officer
- Posts: 32761
- Liked: 7971 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Linux storage repository woes
Looking at the case, it is over 3 months old... curios what prompted you to post this (and with such details) only now, and not when you had the actual issue, or right after? Because by now, even all debug logs are already destroyed, so there's little we can do at this point to understand what went wrong.
Overall though, it really does sound like some misconfiguration on your end. Otherwise, as it happens with real issues due to the sheer number of users we have, there would've been a 10+ page topic about this issue by now, considering our XFS integration feature is almost 1 year old. While Linux repository in general is over 12 years old (and XFS has always been a popular choice there due to its maturity).
I also just did a full text search of our support system for XFS, and there are very few cases for Veeam Backup & Replication mentioning XFS in the past months in general... so, whatever you had does seem to be either a Linux configuration problem, or perhaps some corner case.
Please, keep us posted if you decide to give it another try soon! Perhaps you can start from a small repository to use side by side with your "main" one, until you have enough confidence? May be we will be able to understand your configuration issue from observing the issue on that temp repository.
			
			
									
						
										
						Overall though, it really does sound like some misconfiguration on your end. Otherwise, as it happens with real issues due to the sheer number of users we have, there would've been a 10+ page topic about this issue by now, considering our XFS integration feature is almost 1 year old. While Linux repository in general is over 12 years old (and XFS has always been a popular choice there due to its maturity).
I also just did a full text search of our support system for XFS, and there are very few cases for Veeam Backup & Replication mentioning XFS in the past months in general... so, whatever you had does seem to be either a Linux configuration problem, or perhaps some corner case.
Please, keep us posted if you decide to give it another try soon! Perhaps you can start from a small repository to use side by side with your "main" one, until you have enough confidence? May be we will be able to understand your configuration issue from observing the issue on that temp repository.
- 
				ferrus
- Veeam ProPartner
- Posts: 301
- Liked: 44 times
- Joined: Dec 03, 2015 3:41 pm
- Location: UK
- Contact:
Re: Linux storage repository woes
To the OP, I think it's definitely worth trialling a Linux repository again.  Our experience with them has been nothing but positive.
After YEARS of struggling with ReFS, on Windows 2016/Windows 2019, trying tweaks, failing hotfixes, losing restore points, OS reinstalls, and reformatting back to NTFS, I convinced our managers to try the new Linux Repository w/ Fast Clone support at our DR site.
The experience so far has been faultless - not a single stability issue or missed backup.
That was from the initial Veeam supported release.
It's hard to compare performance and space savings because of the different hardware and data sets between the production and DR site, but I can say for definite that the Linux repo is keeping pace with the Windows one - on what should be slower HW infrastructure. The capacity savings are actually better from the XFS repo - but as mentioned this could easily be due to the different data.
Now we're in the position of stable XFS and (finally) stable ReFS repositories.
To be honest - I'd have migrated all the production repositories to XFS by now, but they are bare metal servers that also act as proxies - a function that Linux can only do within a VM at the moment.
I'm not sure what caused your resource availability issue, but it sounds like a configuration issue, rather than anything with Linux/XFS-reflink technology.
Our architecture also uses VMware and both HW proxies and VM HotAdd ones, using the Linux repo for LAN and WAN Backups Jobs and Backup copy Jobs.
Reverse Incremental is an uncommon configuration though. I know it's technically supported, even with Fast Clone - but aligning with best practice there would have been my first thing to try.
			
			
									
						
										
						After YEARS of struggling with ReFS, on Windows 2016/Windows 2019, trying tweaks, failing hotfixes, losing restore points, OS reinstalls, and reformatting back to NTFS, I convinced our managers to try the new Linux Repository w/ Fast Clone support at our DR site.
The experience so far has been faultless - not a single stability issue or missed backup.
That was from the initial Veeam supported release.
It's hard to compare performance and space savings because of the different hardware and data sets between the production and DR site, but I can say for definite that the Linux repo is keeping pace with the Windows one - on what should be slower HW infrastructure. The capacity savings are actually better from the XFS repo - but as mentioned this could easily be due to the different data.
Now we're in the position of stable XFS and (finally) stable ReFS repositories.
To be honest - I'd have migrated all the production repositories to XFS by now, but they are bare metal servers that also act as proxies - a function that Linux can only do within a VM at the moment.
I'm not sure what caused your resource availability issue, but it sounds like a configuration issue, rather than anything with Linux/XFS-reflink technology.
Our architecture also uses VMware and both HW proxies and VM HotAdd ones, using the Linux repo for LAN and WAN Backups Jobs and Backup copy Jobs.
Reverse Incremental is an uncommon configuration though. I know it's technically supported, even with Fast Clone - but aligning with best practice there would have been my first thing to try.
- 
				Gostev
- Chief Product Officer
- Posts: 32761
- Liked: 7971 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Linux storage repository woes
By the way, we're expanding transport modes support for Linux proxies in v11.
- 
				nitramd
- Veteran
- Posts: 298
- Liked: 85 times
- Joined: Feb 16, 2017 8:05 pm
- Contact:
Re: Linux storage repository woes
Alun,
I'm seconding what ferrus and Gostev are recommending, i.e. give Linux another try. I've been running a test Linux repo for a couple years now with no problems.
I'd suggest that you set up a test Linux repo server using older hardware - this would give you a good platform to test on. When I want to test a new technology I just blow up my test Linux repo, the data disk actually, then configure it to the requirements of what I want to test.
This way you're not impacting production.
If you proceed with another trial let us know how it is going.
Thanks.
			
			
									
						
										
						I'm seconding what ferrus and Gostev are recommending, i.e. give Linux another try. I've been running a test Linux repo for a couple years now with no problems.
I'd suggest that you set up a test Linux repo server using older hardware - this would give you a good platform to test on. When I want to test a new technology I just blow up my test Linux repo, the data disk actually, then configure it to the requirements of what I want to test.
This way you're not impacting production.
If you proceed with another trial let us know how it is going.
Thanks.
- 
				ferrus
- Veeam ProPartner
- Posts: 301
- Liked: 44 times
- Joined: Dec 03, 2015 3:41 pm
- Location: UK
- Contact:
Re: Linux storage repository woes
I've read about that, looking forward to it.

To be honest I did wonder about converting one of our four physical Proxy/Repos in the production site - to a pure Linux repository, and then replace the proxy resources by adding another VM proxy.
At our DR site, the VM Host-Add proxy performs better than the physical proxy in FC Direct Access mode.
The network at each site is also 10/40Gpbs FCoE, so it I suppose it's always worth checking each configuration type, for each setup. The best performance config isn't the same at every site.
Alun, if you get any hardware for a further test - post it here and we can check the config.
- 
				tibus_alun
- Novice
- Posts: 9
- Liked: 2 times
- Joined: Mar 19, 2019 4:25 pm
- Full Name: AJ
- Location: Belfast
- Contact:
Re: Linux storage repository woes
Hi All,
Apologies on the lack of replies, have not had a chance to look near Linux repo's since posting this and forgot all about this post. We couldn't hold off backup migration any longer to chase this case, which is why we reverted. I posted this after, just to get thoughts and opinions for the next time I give it a go. We have 2 more similar setups incoming, this time without time constraints, so will be the perfect time to try all this again and get it right. Clearly others are using Linux repo's just fine.
Thanks,
Alun.
			
			
									
						
										
						Apologies on the lack of replies, have not had a chance to look near Linux repo's since posting this and forgot all about this post. We couldn't hold off backup migration any longer to chase this case, which is why we reverted. I posted this after, just to get thoughts and opinions for the next time I give it a go. We have 2 more similar setups incoming, this time without time constraints, so will be the perfect time to try all this again and get it right. Clearly others are using Linux repo's just fine.
Thanks,
Alun.
- 
				Gostev
- Chief Product Officer
- Posts: 32761
- Liked: 7971 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Linux storage repository woes
That is correct, Linux-based backup repository usage has exploded in the recent year due to V11 bringing the hardened Linux repository.
			
			
									
						
										
						Who is online
Users browsing this forum: Bing [Bot] and 38 guests