We have a scripted process that walks through a list of SQL databases, grabs some details from each (permissions, etc.), drops the database from our development SQL server, starts a restore session and copies the MDF file from the backup of our production SQL server, attaches that to the dev server, and restores the original permissions.
We've been using it for quite a while, no issues.  It used to go through 12 databases (some pretty large) in about 3 hours.
Beginning of June, we had an issue with the old EVA SAN that stores both our Veeam backups and the data drives for the dev SQL VM.  The cache was lost and the data got jacked up, but the hardware was all checked out and was good. However - it also did... something? to the volumes that caused more persistent errors. I rebuilt all the volumes using identical settings to the originals, got everything presented and moved over, and everything was fine.  No more errors. The dev SQL VM was restored from a backup that pre-dated the crash, with no issues
Except... Now the process of grabbing the database files takes so long it essentially never finishes.  I've let it run for 11+ hours trying to get a single one copied over, but have finally stopped it so it didn't run into our normal maintenance window.
I spoke to VMWare support, who checked everything over and said that there were no issues on the VM or datastores. I spoke to Veeam support and all is well on the backup side. Spoke to the SAN support people again to double-check the logs for any issues there, but everything is clean. I tweaked and tuned everything, checked every configuration detail, and rebuilt any pieces that could be rebuilt without a major disruption. I simply can't find anything that would have changed that would cause this.
It seems like the slowness stems from copying the files over to the other server, which is understandable - I can reproduce that slowness moving similarly-sized files between other servers, but how did it work before?? It's like it was previously able to push the file over in some much faster manner that it is no longer able to or trying to use.
I'd welcome any thoughts or insight.
			
			
									
						
										
						- 
				dgmahon
- Lurker
- Posts: 2
- Liked: never
- Joined: Jan 15, 2015 3:14 pm
- Full Name: D. Mahon
- Contact:
- 
				HannesK
- Product Manager
- Posts: 15598
- Liked: 3445 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Formerly fast FLR now taking incredibly long
Hello,
and welcome to the forums.
I see two options
1) blaming the network (because that's what everybody does in IT
2) testing all components for performance (diskspkd, iperf, etc.)
I guess that one of the SAN settings changed. Maybe cache, as you mentioned that it broke. But really, the EVA is so old... not sure how many people today know how to handle it.
Best regards,
Hannes
			
			
									
						
										
						and welcome to the forums.
I see two options
1) blaming the network (because that's what everybody does in IT

2) testing all components for performance (diskspkd, iperf, etc.)
I guess that one of the SAN settings changed. Maybe cache, as you mentioned that it broke. But really, the EVA is so old... not sure how many people today know how to handle it.
Best regards,
Hannes
- 
				dgmahon
- Lurker
- Posts: 2
- Liked: never
- Joined: Jan 15, 2015 3:14 pm
- Full Name: D. Mahon
- Contact:
Re: Formerly fast FLR now taking incredibly long
Hello, and thank you!
I have confirmed the SAN settings from before and after it went out - nothing changed except the volumes themselves, which were rebuilt (but with the same settings). There's really not a lot of user-adjustable settings in it at the system level. We're also not seeing any other issues except this one thing
I have also checked the network, disk, etc - but again nothing changed for any components involved? The specs for everything match what is expected, and the performance isn't significantly worse than I see going to other servers.
Basically, everything suggests that the process we had in place *never* would have worked, except that it obviously did. There must be something at some level of some component that I'm not seeing or thinking of, but I really don't know what it could be.
			
			
									
						
										
						I have confirmed the SAN settings from before and after it went out - nothing changed except the volumes themselves, which were rebuilt (but with the same settings). There's really not a lot of user-adjustable settings in it at the system level. We're also not seeing any other issues except this one thing
I have also checked the network, disk, etc - but again nothing changed for any components involved? The specs for everything match what is expected, and the performance isn't significantly worse than I see going to other servers.
Basically, everything suggests that the process we had in place *never* would have worked, except that it obviously did. There must be something at some level of some component that I'm not seeing or thinking of, but I really don't know what it could be.
Who is online
Users browsing this forum: Amazon [Bot] and 39 guests