Comprehensive data protection for all workloads
TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Database Server Restores - Consistency Errors

Post by TMassa »

We are on ESX4U1 with Veeam B&R 4.1.1. Our servers are all 2008 x64 standard and the DB servers all run 64-bit SQL 2008. All of our backups report good, and Veeam Support has checked our logs and there are no VSS errors or anyting to indicate that it's a Veeam problem. We do have an active support call open...but I'd like to see if anyone else has come across this. We are using CBT and dedupe using SAN mode. The DB files are on a separate VMDK, the trans log files are on another.

We had a need to restore a DB server from backup recently, and after the restore, we're running a consistency check and getting the following types of errors:

Code: Select all

Table error: Object ID 1180465367, index ID 1, partition ID 72057602631598080, alloc unit ID 72057602780495872 (type In-row data), page ID (1:7424586) contains an incorrect page ID in its page header. The PageId in the page header = (1:7421642).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 1180465367, index ID 1, partition ID 72057602631598080, alloc unit ID 72057602780495872 (type In-row data), page ID (1:7424587) contains an incorrect page ID in its page header. The PageId in the page header = (1:7421643).
Msg 8909, Level 16, State 1, Line 1
Originally, we thought that perhaps we backed up a DB that was already inconsistent. We do consistency checks on Thursdays, so we restored the Friday morning backup after a known good consistency check. Same problem. We've tried a third server this morning with a similar result.

Then we thought that perhaps we should try a different DB server in a separate backup job, and still got the same problem. The vendor that recommended Veeam (and have the same server/DB/Veeam versions) has run some DB restores of their servers and they report their restore tests have come back clean. Their DBs may not be as large as ours (350-400 GB), but I'm not convinced that it would matter.

We're just grasping at straws now to run any test we can think of to determine the root cause of the table corruption in the DBs after a restore. The SQL server starts up fine, and there aren't any event log errors...it's only a DB consistency check that indicates there is a problem.

I am in the process of creating a new single server backup of a new DB server to see if we can restore it without any consistency issues. The file system and OS don't appear to have any problems at all. :(

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

Are you restoring just the DB files or the entire VM? Do you ever perform full backups using Veeam or forever incremental?

With an earlier version of Veeam we had an issue where restoring a Linux server would cause fsck checks to report corruption of the underlying filesystem. We had to force the restore to use agentless mode. We reported the issue to Veeam and the problem was corrected, however, it might be worth trying an agentless mode restore in this case as well.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

We do a full backup every 7 days with 14 day retention, so we have 2 "full" backups and 12 VBRs.

It's worth a shot since we don't have any other ideas until we work with Veeam/VMware. I'll update after we try it.

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

TMassa wrote:We do a full backup every 7 days with 14 day retention, so we have 2 "full" backups and 12 VBRs.

It's worth a shot since we don't have any other ideas until we work with Veeam/VMware. I'll update after we try it.
Another option might be to preform a file level restore of the database files themselves rather than restoring the entire VM.

Vitaliy S.
Product Manager
Posts: 24330
Liked: 1880 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Database Server Restores - Consistency Errors

Post by Vitaliy S. »

Hello Tony, I would also try to create a Volume Shadow Copy snapshot of the volume containing database, then revert to it and see if you have the same errors after the consistency check takes place.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

@tslighter - That was the first thing I did because I didn't want to restore the entire 880 GB of server VMDKs...I am restoring another backup via Agentless mode now. Restoring a 20 GB DB server (2003 w/SQL 2000) also encountered one of the 15 small databases that started up SUSPECT. The other 14 databases recovered with updated transactions. Here are the SQL logs for the suspect DB:

Code: Select all

The LSN (32180:72333:3) passed to log scan in database 'TY_content' is invalid..
Error: 9003, Severity: 20, State: 1
Database 'TY_content' (database ID 7) could not recover. Contact Technical Support..
@Vitaly - I will take a VSS snapshot and create a new backup of the live server for this test.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

@tslighter - The same database came back as suspect restoring via agentless mode.

Code: Select all

The LSN (32180:72333:3) passed to log scan in database 'TY_content' is invalid..
Error: 3414, Severity: 21, State: 1
Database 'TY_content' (database ID 7) could not recover. Contact Technical Support..
Different error and severity; however. This is a SQL 2000 DB server, so I'm not sure if it's even supported any longer...I'm just trying something different (and significantly smaller). The logs do indicate that the DB was frozen just before the backup, so I guess that there is some compatiblity there.

Vitaliy S.
Product Manager
Posts: 24330
Liked: 1880 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Database Server Restores - Consistency Errors

Post by Vitaliy S. »

Tony, SQL Server 2000 doesn't have VSS support. You should be using pre-freeze and post-thaw scripts in order to have SQL Server 2000 backups in consistent state.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Thanks...I'm now making a VM copy of a 2008 server. FYI, after the SQL (VSS) event messages indicating that the databases are backed up, I get a VSS informational message in the event log:

The VSS service is shutting down due to idle timeout.

Is this normal? After the snapshot is created?

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

@Vitally -- I don't think this statement is 100% correct. SQL 2000 does support VSS if running on Windows 2003 via the MSDE Writer. Here's what Microsoft says on the subject:
A VSS writer (MSDE Writer) shipped with the VSS framework in Microsoft Windows XP and Microsoft Windows Server 2003. This writer coordinates with SQL Server 2000 and earlier versions to help in backup operations. Starting with SQL Server 2005 installation, SQL Writer is the preferred writer, though MSDE Writer will continue to work and will be the default writer if installed and SQL Writer is not enabled. To start and use the SQL Writer, first disable the MSDE writer from enumerating SQL Server 2005 databases.
Are you sure you're using Veeam VSS and not VMware?

Vitaliy S.
Product Manager
Posts: 24330
Liked: 1880 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Database Server Restores - Consistency Errors

Post by Vitaliy S. »

Tom, that's right, thanks for pointing that out. But as far as SQL writer is more preferred, I believe using scripts would be better.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

We restored another (2008 x64 w/SQL 2008) server that has the same problems. It seems that every DB server we restore has consistency problems.

Code: Select all

CHECKDB found 0 allocation errors and 81 consistency errors in table 'sms.REPOLOAD' (object ID 969431619).
Msg 8928, Level 16, State 1, Line 1
Object ID 1589580701, index ID 1, partition ID 72057594040811520, alloc unit ID 72057594042449920 (type In-row data):
The following event log message is in the live server:
I/O is frozen on database SMS. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
This message is not; however, in the restored server. Would that mean that the snapshot being taken before the VSS quiesce of SQL has completed?

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

Are there other messages about VSS? I get a lot more messages than that.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

On the live server, there are messages for the DB freezes and subsequent DB backups, but the messages for those freezes and "database successfully backed up" messages do not appear when the server has been restored.

I would have to believe that a restored server should at least show the info messages regarding the VSS freeze. I'll elaborate some details on this later. I just find it a little odd that a restored server doesn't show the DB freeze messages.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Here is the "live" server event log at the time of the backup:
Image

This is the restored server event log just after server restart. None of the messages regarding DB freeze I/O messages in the above capture are in the application event log of the restored server.
Image

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

I don't know if the snapshot should contains those messages or not. VSS freezes I/O, both for SQL and the entire system, it wouldn't be able to write the message while the I/O was frozen, not until after it was frozen, at that point the snapshot would have to already be taken.

Do you see messages in the system log regarding Veeam VSS? The logs above are just the SQL portion, there should be equivalent logs for Veeam VSS and Microsoft VSS service starting. That would give you a better idea of the timeline.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

On the live server (System Event Log):

Code: Select all

3:43:54 - The Volume Shadow Copy service entered the running state
3:43:54 - The VeeamVssSupport service entered the running state
3:43:56 - The Microsoft Software Shadow Copy Provider service entered the running state
3:44:28 - The VeeamVssSupport service entered the stopped state
3:47:26 - The Volume Shadow service entered the stopped state
3:50:26 - The Microsoft Software Shadow Copy Provider service entered the stopped state
On the restored server, the last System Event log message before the server started at 4:38 PM

Code: Select all

3:35:46 - The Windows Modules Install service entered the running state.
There are no messages on the restored server indicating that VSS or VeeamVssSuppport was started

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Well, I've tried restoring:
Full servers back to VMware
Copies to a sandbox
Restored individual VMDK with DB files
Extracted DB files from backup and copied over to live servers
Restored to USB
Via copy backup

All result in DB consistency errors. This would lead me to believe that the problem is that the DB is somehow still in use at the time that the server is snapped for backup...yet nowhere are there any errors.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

I am now running a new - full - backup on a small DB server

Code: Select all

Create virtual machine snapshot   
Requested Start Time:  7/23/2010 1:17:46 PM
Start Time:  7/23/2010 1:17:46  PM
Completed Time:   7/23/2010 1:17:51 PM
Result of Veeam Backup

Code: Select all

1 of 1 VMs processed (0 failed, 0 warnings)

Total size of VMs to backup: 90.00 GB
Processed size: 90.00 GB
Processing rate: 244 MB/s
Start time: 7/23/2010 1:17:14 PM
End time: 7/23/2010 1:23:32 PM
Duration: 0:06:18
This server restored successfully from two different backups. These are very small/low usage databases on this server (360 MB). So the only difference is size/utilization rate of the databases that are failing to restore correctly. Everything else is constant: Veeam job settings, accounts, OS version, SQL version, hardware, etc. For now, we'll have to stop the application/sql service on the servers and manually back them up to get a good copy until we can figure out why our restored DB servers have inconsistencies upon restore. I still think it's the snapshot process that's the problem...I really think the corrupted tables aren't completely quiesced before the snapshot is created...but that's just my guess.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Well after reading http://www.veeam.com/forums/viewtopic.p ... e&start=15, I'm not sure why anyone should use Veeam for DB server backups, sure you can back them up, but they are only crash-consistent. From what I can tell, Veeam cannot reliably perform transactionally-consistent (point-in-time) backups of SQL server without losing data.
tslighter wrote:We restored all of these systems via Veeam, however several had serious database level issues with the restore, including corrupt blocks. Soem were so damaged they would not start. One of the MS SQL databases that was backed up only with Veeam had serious damage to it "master" database which left us unable to perform some basic operations. We had to preform some "risky" operations to recover the "master" database to return the server to normal operation. We were wishing we had native SQL backups that day.
This has been my experience as well, tslighter. We are now scrambling to find a database backup solution that will reliably restore SQL to a point-in-time. It's been a disappointing week to be sure.

Gostev
SVP, Product Management
Posts: 26840
Liked: 4336 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

TMassa wrote:I really think the corrupted tables aren't completely quiesced before the snapshot is created...but that's just my guess.
That would indicate issues with Microsoft VSS though, which is not very likely? Anyhow, I would like to let our developers research this first - this issue could be specific to certain SQL server versions, too (because we have not observed it in our labs before). There are some good ideas above such as performing regular shadow copy and reverting to it, we will try that as well if we are able to reproduce this.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

We've now resorted to disabling our HA and evacuating a single ESX host and updated to the latest ESX and VM Tools versions, and moving a single DB server to the host and backed it up and restoring it.

Gostev - What is the "official" company line regarding DB backups. Are they supposed to be transactionally-consistent every time? Even on high-transaction servers?

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

Still same problem:
CHECKDB found 0 allocation errors and 24 consistency errors in table 'sms.sap_perfinfo' (object ID 1861346887).
CHECKDB found 0 allocation errors and 325 consistency errors in database 'SMS'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (SMS).

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

TMassa wrote:Well after reading http://www.veeam.com/forums/viewtopic.p ... e&start=15, I'm not sure why anyone should use Veeam for DB server backups, sure you can back them up, but they are only crash-consistent. From what I can tell, Veeam cannot reliably perform transactionally-consistent (point-in-time) backups of SQL server without losing data.
This has been my experience as well, tsighter. We are now scrambling to find a database backup solution that will reliably restore SQL to a point-in-time. It's been a disappointing week to be sure.
I'm not sure if our problems could all be blamed on Veeam. Our disaster was caused by a storage array that silently corrupted blocks right from underneath the OS. It killed dozens of virtual machines, and some physical machines too. Even though we had some suspicions of Veeam, because there was corruption even when we restored files from 24 hours prior to the disaster, it was not really something we could prove as we didn't know how long the array had been silently corrupting blocks. We wished for SQL native backups, like we had for our Oracle databases, because they would have likely detected any corruption prior to the crash.

We don't have any very large SQL databases, our biggest are in the 250GB range, but some are quite busy. I have two servers that are under near constant transactional load, one with about 22 individual databases with 10-30GB each, and one that's around 40GB that gets constant updates and activity. I'll restore both of those and perform a consistency check and see what happens.

I've already restored a much smaller database (<1GB) that has a light transaction load, and it was perfectly clean. It won't really help you much, but it will make me feel better (or, if it doesn't work, worse). If it does work it might indicate some very strange environment issue.

tsightler
VP, Product Management
Posts: 5687
Liked: 2508 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Database Server Restores - Consistency Errors

Post by tsightler »

TMassa wrote:On the live server (System Event Log):

Code: Select all

3:43:54 - The Volume Shadow Copy service entered the running state
3:43:54 - The VeeamVssSupport service entered the running state
3:43:56 - The Microsoft Software Shadow Copy Provider service entered the running state
3:44:28 - The VeeamVssSupport service entered the stopped state
3:47:26 - The Volume Shadow service entered the stopped state
3:50:26 - The Microsoft Software Shadow Copy Provider service entered the stopped state
On the restored server, the last System Event log message before the server started at 4:38 PM

Code: Select all

3:35:46 - The Windows Modules Install service entered the running state.
There are no messages on the restored server indicating that VSS or VeeamVssSuppport was started
So this part is the most interesting thing I've found so far. I restored our most active SQL server to a test environment. This server host 43 individual databases, although the largest is only 35GB. Total size for all 43 databases is around 250GB. Anyway, DBCC reports all databases are in good shape after a restore, no issues reported at all.

However, my restored servers actually do include all of the following messages in the system log:

Code: Select all

7:01:51 - The Volume Shadow Copy service entered the running state
7:01:51 - The VeeamVssSupport service entered the running state
7:01:54 - The Microsoft Software Shadow Copy Provider service entered the running state
So my restored server's snapshot was obviously taken while the VSS services were running and before they were stopped. There are no messages about the MSSQLSVR in the restored system, but probably because VSS stopped all writes. It's pretty strange that your restored systems don't include such information. It is indeed like somehow Veeam VSS is doing it's job, but the snapshots are being taken before it's ready.

pops106
Enthusiast
Posts: 48
Liked: never
Joined: Jan 01, 2006 1:01 am
Full Name: Jody Popplewell
Location: Yorkshire
Contact:

Re: Database Server Restores - Consistency Errors

Post by pops106 »

I cant test right now to be 100% sure but I am 99% sure we see the VSS events after the restore, it is the same with Exchange there are some VSS messages and then the normal exchange database check which run through.

I haven't had any issues restoring SQL DB's, largest DB is probably 100GB with 6hrs of completely maxed out server and 36GB of compressed changes each night.

Gostev
SVP, Product Management
Posts: 26840
Liked: 4336 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

TMassa wrote:Gostev - What is the "official" company line regarding DB backups. Are they supposed to be transactionally-consistent every time? Even on high-transaction servers?
Sure. Transactionally-consistent backups is the reason why we integrate with Microsoft VSS, which is supposed to quiesce all VSS-aware apps on the computer no matter of how loaded they are. In case of SQL server, the actual quiescence is performed by Microsoft SQL VSS Writer component (not Veeam Backup itself), our product simply asks Microsoft VSS to perform such quiescence, and then creates VM snapshot when VSS reports that all the system has been quiesced. Looks like something in this chain fails in some conditions - this is what we will need to investigate.

Gostev
SVP, Product Management
Posts: 26840
Liked: 4336 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

Tom, Jody - thank you very much for confirming that you see the VSS events on the restored SQL server. It looks like the OP's issue is indeed caused by snapshot being created too early, before VSS processing is done. I know our support had already asked Tony for VSS agents logs yesterday, these should help us to shed some light on what exactly is happening with VSS processing on the affected VMs.

TMassa
Enthusiast
Posts: 34
Liked: never
Joined: Mar 08, 2010 10:46 pm
Full Name: Tony Massa
Contact:

Re: Database Server Restores - Consistency Errors

Post by TMassa »

That's somewhat of a relief...and what I would expect to see just before the VSS stopped I/O. Now, at least, we have something to go on. Thanks for confirming that you see the VSS info mgs in the Application Event log.

I'll be working with Veeam support later this morning. FYI, a VM clone doesn't cause any DB problems, although I'm aware it's not the same process.

Gostev
SVP, Product Management
Posts: 26840
Liked: 4336 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Database Server Restores - Consistency Errors

Post by Gostev »

Thank you for your time and cooperation on this matter. I am really interested myself now to learn what could be causing this issue.

Post Reply

Who is online

Users browsing this forum: Google [Bot], jbruyet, Matt.Sharpe and 73 guests