Comprehensive data protection for all workloads
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

We also gave a copy of a backup to our vendor that also uses Veeam, and their restore of our backup was also bad, so I think we can safely assume that the problem is VSS (or more specifically, Snapshot timing) related.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

We've done a lot of different testing scenarios with our problem, and we're narrowing down the problem.

I built out a new VM host to act as a separate Veeam B&R backup server and allocated enough disk to do a backup of one of our problematic VMs. I didn't want to set up the MS iSCSI initiator just to test a backup, so I just ran the backup over the network during the evening. The restore was accidentally deleted by another admin who thought the most recent restore was from earlier. :( He was already running another restore test by the time I got in to work.

I also had removed the VMware snapshot provider service by then, so I never got to see the results of the first restore from the virtual Veeam...I configured the job to only retain one backup, and started another backup by then.

The VM snapshot provider-less backup and restore worked (as usual), I then ran the dbcc checkdb(dbname) with no_infomsgs on the DB, and there were no errors. I still did not see the VSS service starting messages in the system event log; however. I then took another backup and restored for the virtual Veeam server, and the second restore worked succesfully as well.

Next, we added the VMware snapshot provider service back, then took a backup from the physical Veeam server, restored it, and the DB came back with errors during the consistency check...even after we set the backup to use network mode. We have not had a good restore from the physical server.

So now I am restoring the server after taking a backup from the virtual Veeam server to see if the VMware snapshot provider service may be a problem for some reason. If the restore is successful, the issue would appear to be caused by the physical Veeam server, and its configuration. If not, then perhaps it's the VMware tools.

I have not read any posts that would indicate that a conflict with Symantec Backup Exec 12.5 would cause this issue, or any patches for that matter...so I'm closer to the culprit, but still have to determie what, on the physical server, could be causing our database backup (or snapshot) problems.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

Actually, our code automatically disables VMware snapshot provider when Veeam VSS is enabled... so it's presence cannot be causing the issue. But, if you don't have Veeam VSS enabled, then VMware snapshot provider may definitely be causing the observed behavior.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

I understand that, but at this point, I'm not taking anything for granted.

FYI - The 3rd restore from the virtual Veeam server was successful. Looks like there's a problem with the physical server's software/hardware. I can't imagine where to look first.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Last night, I built a brand new 2008 physical Veeam server with nothing on it except for Veeam. Set the backup for SAN mode. Restore failed consistency check. Now we're trying to determine if the issue is something with iSCSI initiator or our Equallogic SAN. We're on the latest iSCSI initiator, so not much to troubleshoot there...

We'll update the firmware on our SAN, and in the mean time, I'm running a new, separate, full backup (Network mode) of the server.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

Hi Tony, thanks for the updates!
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

It appears that the problem is backing up using SAN mode. We've tried a new full backup, using our original Veeam (physical) server, in Network mode, and they've been fine. It would also explain why my virtual Veeam server backups worked, since it did not have an iSCSI connection to the SAN, and network was the only option. We're taking new fulls over the network to confirm, but because of the size of our other DB servers, a backup/restore will take a few hours.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

I was reading this article and came across this quote:
The release of VMware vSphere sees the introduction of new VStorage API's which enable VMware to communicate directly with a VM aware storage platform. EqualLogic iSCSI SAN is one of the first virtual storage platforms to offer support for VStorage.

When vSphere is integrated with EqualLogic SAN operations such as snapshotting, cloning and provisioning new machines through VCenter will lead to the storage workload of such operations being passed directly to the SAN to complete. This in turn will reduce load on the ESX servers and reduce the time taken to complete many operations by between 50 and 75%
Is this "integration" with the SAN automatic, by virtue of using vSphere and an EQ SAN? If so, is there a way to "turn it off"?
More info here - YouTube
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

No, I am sure this is NOT automatic.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

[SOLVED]Re: Database Server Restores - Consistency Errors

Post by TMassa »

We have a resolution; however, we're still trying to determine root cause. There was a registry change suggested by the folks at Veeam that has appeared to solve my problems. This was a Veeam-related registry change, and so far, our subsequent SAN-mode tests have come back successful. I'll leave it up to the Veeam folks to determine wether or not to post the reg change; however, if we hadn't been running a database consistency check, we would not have know that there was anything wrong at all.

There were no errors, or obvious signs of a problem with the backup or restore...SQL started up fine and didn't have any event log errors during DB startup. Just a heads up for everyone.
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Database Server Restores - Consistency Errors

Post by joergr »

please post this registry change.

best regards,
Joerg
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

Tony, glad you have it resolved. Thanks for all your help with this.

Joerg, the registry change should not be used unless you are actually experiencing this issue (and we have seen it only once for the whole time, with direct SAN access on very specific set of hardware and in specific environment). As you can see from other posts in this thread, other users are not seeing similar issues. If you believe you are actually affected (which is very unlikely), please open the support case and let our technical staff confirm this is indeed the same issue - before they make this registry change for you. Otherwise, it would be best to wait until we determine the root cause and scope of this problem. I will definitely update this thread early next week.

Thanks everyone!
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Database Server Restores - Consistency Errors

Post by joergr »

thanks anton, please keep us updated.
best regards,
joerg
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

DId the registry change fix restores from the existing backups or were they no good? That wasn't completely clear from your post. If the registry change doesn't somehow fix the backups, then it's not reasonable to expect people to wait unless they "are actually affected" since they will not know if they are affected until they actually need to restore their backups and at that point it is too late.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

What's even worse though is to recommend everyone to change the registry setting without thorough testing, because the registry change may as well produce unexpected results in some other deployments. I don't think it makes much sense to rush at this point, since we are talking about code that has been out there for 10 months now (since v4 was released last October). If it was large-scale issue, we would know by now.

Right now we are sure that the scope of this issue is a single deployment - this cannot be reproduced on our SANs, nor other customers (like yourself) are seeing this. So there is something special about that deployment, and while registry change resolved the issue in this specific deployment, what if it will brake other deployments (not having the issue today)? At least we are seeing different fix behavior in affected environment, and on our SAN. Thus, we would like to take some more time with testing.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

TSlighter wrote:DId the registry change fix restores from the existing backups or were they no good?
No, the existing backups, more specifically, the snapshots, were corrupted before Veeam backed them up. We've had to take all new backups. We have kept the old data, since some of our Linux and non-DB servers seemed to work okay, but some of the files could have been affected by the problem.
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

Gostev wrote:What's even worse though is to recommend everyone to change the registry setting without thorough testing, because the registry change may as well produce unexpected results in some other deployments. I don't think it makes much sense to rush at this point, since we are talking about code that has been out there for 10 months now (since v4 was released last October). If it was large-scale issue, we would know by now.

Right now we are sure that the scope of this issue is a single deployment - this cannot be reproduced on our SANs, nor other customers (like yourself) are seeing this. So there is something special about that deployment, and while registry change resolved the issue in this specific deployment, what if it will brake other deployments (not having the issue today)? At least we are seeing different fix behavior in affected environment, and on our SAN. Thus, we would like to take some more time with testing.
I did not ask for you to release the registry key, but additional information regarding the issue, and the specific environment it was seen in, and what you currently think might be happening, would still seem prudent. That way users with similar environments can test their restores of SQL databases right away and determine if they also have the problem. The protection of user data is paramount over all other concerns.
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Our environment:
Veeam B&R 4.1.1.105 - Physical Server (MS iSCSI Initator - Server 2008 Std SP2)
VMWare ESX 4.0.0, 208167 and one host on 4.0.0, 261974 (updated to test backup issue)
Force 10 S25N SAN Switches

Dell EQ PS6500 SAN:
Each RAID 10 w/ 48 10K 600GB SAS Drives
2 Members: Firmware V4.3.2 (R109179)
There is an updated FW out which may have (and probably would have) fixed the problem, but we were waiting on "root cause" analysis from Veeam & Dell.

Target Storage: Dell MD1000 w/13 2 TB drives
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

tsightler wrote:what you currently think might be happening
The issue affects direct SAN access mode only. From our side, we do not see much into the storage, and can judge only on what is returned to us, but here is what happens from our perspective.

First, our backup job creates a snapshot on the processed VM, which makes VMDK file static (and read-only). Then, the job starts reading the VMDK file on block level. Let's say, here is the VMDK file content, with each letter presenting 1 data block: ABCDEFGHIJKL (for the sake of simplicity, in my example the VMDK file is not fragmented).

Now, it turns out, that in Tony's environments, when VMDK file is read by requesting multiple consequent blocks at once from the SAN like this: ABCD, EFGH, IJKL; then sometimes returned data block's content does not match what is actually stored in the VMDK file.

However, when VMDK file is read in smaller blocks (A,B,C,D,E,F,G,H,I,J,K,L); the data provided by SAN it correct. This typically results in degraded full backup performance due to larger number of I/O operations (lower performance all SAN we have ever tested this, starting our very first vStorage API based test tool from the beginning of 2009 that I used to give to random customers for testing). However (another anomaly), on Tony's SAN, reading in smaller blocks actually improves performace (which is really unexpected).
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Gostev wrote:However (another anomaly), on Tony's SAN, reading in smaller blocks actually improves performace (which is really unexpected).
I'm sure this will slow down considerably when we update the firmware. :roll:
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

The drop should not be too bad anyway (typically 10-20% on full backup, if I recall correctly). Anyhow, I think we have no choice but to implement your current "fix" as a default setting in 4.1.2, and keep it this way until this whole story becomes more clear.
drbarker
Enthusiast
Posts: 45
Liked: never
Joined: Feb 17, 2009 11:50 pm
Contact:

Re: Database Server Restores - Consistency Errors

Post by drbarker »

TMassa wrote:I'm sure this will slow down considerably when we update the firmware. :roll:
Why; is the array going to start actually reading stuff from the spinning rust? :D

In all seriousness: Do you know why the updated FW would have probably fixed things? Between this and http://www.virtualizationbuster.com/?p=457 it's not sounding good

Gostev: unless you can confirm it's a vstorage api issue, please don't include the 'fix' in 4.1.2 - most arrays can mask over the effects of small IO/s with read-ahead. But the cheap & cheerful ones will be slower :-(
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

drbarker wrote:Gostev: unless you can confirm it's a vstorage api issue, please don't include the 'fix' in 4.1.2 - most arrays can mask over the effects of small IO/s with read-ahead. But the cheap & cheerful ones will be slower :-(
There will be a way for advanced users to enable pre-4.1.2 behavior easily if needed, if your testing shows that you are not affected. I am only talking about default behavior.
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

There's no telling where the problem may lie at this point. I run Equallogic arrays and we've seen no signs of this issue, however, we use hardware iSCSI initiators or virtual appliance mode for all of our backups, not Microsoft Software initiator. It could be a problem with Veeam, vStorage API, Microsoft iSCSI initiator, Equallogic's custom multipath code, array firmware, etc. It will be very interesting to see, but it is certainly critical that users data is protected correctly, otherwise backups aren't really worth taking.
Gostev
Chief Product Officer
Posts: 31455
Liked: 6646 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

We are currently suspecting that the issue is due to this extra EQL logic.
TMassa wrote:I was reading this article and came across this quote: Is this "integration" with the SAN automatic, by virtue of using vSphere and an EQ SAN? If so, is there a way to "turn it off"?
More info here - YouTube
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: Database Server Restores - Consistency Errors

Post by joergr »

eql will only utilize the vaai with fw 5.0 or newer.

and i am more than sure eql firmware (even using 2.x) would have nothing at all to to with this behaviour.

but.....i read through this whole thread and have a theory: maybe the last one a "normal" researcher would look for it but i´d say it´s either the server nic (not the esx nic, the veeam server´s or (and that one i believe could it really be) the switch. and while i am at the switch, did you made absolutely sure you enabled flow control on it? if not, read the eql best practices, you will see flow control at the very very very beginning with huge important-signs ;-) - furthermore i´d like to know if you have enabled jumbo frames on your veeam servers nic. if yes, change it. don´t laugh at me, jumbo frames on a veeam backup server will give you NOTHING. believe me. than i´d like to know which nic you are using in the veeam server. if it´s no intel or broadcom, report very fast cause maybe i could tell you the reason at once. but for starters, make absolutely sure flow control is enabled on your switch and make sure ALL anti-flooding and anti-hammering and anti-arp-spoofing mechanisms on your switch are also turned off. i would bet it´s related to this. makes absolutely sense.

best regards,
Joerg

PS this is NOT related to this thread but i think it´s worth mentioning: for those who have already installed fw 5.0 or 5.0.1 on their eqls when using vsphere 4.1: DON´T use vaai at this time. wait till eql will release a newer version.
OREOSpeedwagon
Influencer
Posts: 12
Liked: never
Joined: Jul 29, 2010 1:48 am
Full Name: OREOSpeedwagon
Contact:

Re: Database Server Restores - Consistency Errors

Post by OREOSpeedwagon »

Gostev wrote:We are currently suspecting that the issue is due to this extra EQL logic.
Those are all options in the now recalled version 5 firmware, so unless you're on 5, those features don't exist.
sjolshagen
Novice
Posts: 3
Liked: never
Joined: Aug 13, 2010 8:02 pm
Full Name: Thomas Sjolshagen
Contact:

Re: Database Server Restores - Consistency Errors

Post by sjolshagen »

PS this is NOT related to this thread but i think it´s worth mentioning: for those who have already installed fw 5.0 or 5.0.1 on their eqls when using vsphere 4.1: DON´T use vaai at this time. wait till eql will release a newer version.
Hi, my name is Thomas Sjolshagen and I work at Dell for the EqualLogic engineering group. Sorry for the off-topic reply, but I saw the above comment and thought I'd share some relevant information;

If you haven’t yet installed firmware version 5.0.0 or 5.0.1, I recommend you wait a few more weeks before installing any updates to your controller firmware above version 4.3.6. This is because the engineering team identified a few changes we needed to make to those firmware revisions, and we’re planning to release an updated version (v5.0.2) on or around August 30, 2010. Should you have received an array from the Dell factory with version v5.0.0 installed, please make sure you contact Dell support immediately. Additionally, if you want more details for your specific environment, please call your EqualLogic support number and give them your service tag.

We prefer to be direct and act fast on these kinds of things, so please let us know if you have any additional questions.

// Thomas
sjolshagen
Novice
Posts: 3
Liked: never
Joined: Aug 13, 2010 8:02 pm
Full Name: Thomas Sjolshagen
Contact:

Re: Database Server Restores - Consistency Errors

Post by sjolshagen »

Hi,

Just wanted to follow up on my previous message about the availability of the new 5.0.2 firmware release.I previously stated we expected to provide something on or around August 30th. Since that date has come and went, I wanted to update you on where we are at in terms of shipping the v5.0.2 release.

At the moment, our engineering team is still working on it and I promise to keep this forum in the loop with any relevant information until we ship. We (Dell) strive to maintain the highest possible level of quality and I hope it is obvious that we appreciate the patience our users have shown as you wait for the release of v5.0.2. Please let me know if you have any questions.

// Thomas Sjolshagen @ Dell
Vitaliy S.
VP, Product Management
Posts: 27055
Liked: 2710 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Database Server Restores - Consistency Errors

Post by Vitaliy S. »

Thomas, thank you for keeping us posted on this matter.
Post Reply

Who is online

Users browsing this forum: Google [Bot], reph and 175 guests