Host-based backup of VMware vSphere VMs.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer » 1 person likes this post

Hello,

not directly connected to Veeam - still:
We are experiencing data inconsistencies on multiple SQL 2019 systems while the VMs are running on Snapshot (for example in the middle of backup). We could trace the start of the issues to our upgrade to ESX 7U2.
The issues are just in:
- tempdb data files (not always the same data file, but always tempdb) - SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x76484a82; actual: 0x21fa7e9a). It occurred during a read of page (6:2595847) in database ID 2 at offset 0x000004f380e000 in file 'T:\SQL_DATA\tempdbX.ndf'
- User database logfiles in several databases - Backup detected log corruption in database TESTDB. Context is Bad Middle Sector. LogFile: 2 'K:\SQL_LOG\TESTDB\TESTDB.ldf' VLF SeqNo: x1c7aae VLFBase: x2470000 LogBlockOffset: x24fd000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x1c7aae LogBlock.StartLsn.Blk: x416 Size: xe400 PrevSize: xe400

We did dbcc checkdb with extended logical checks - not one database has inconsistencies after the snapshot is removed.
The interesting thing is that we have about 40-50 different inconsistencies in > 10 DB servers over the last ~6 months - and they are *ALWAYS* in these two locations (random database log files or tempdb data file). VMware is checking at the moment... Storage vendor already validated that the data on all storage mirrors (RAID + sync mirror + checksum for every block) is identical, so it must come from a higher layer.

Does anyone face a similar issues?
We would not have seen the issues if we would not have checked logs / set log backup to "every 5 minutes" - Veeam then found a log corruption fast enough while backup was running.

If you search for "corr" in the sql server event log you find the issues quite easily.

Markus
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by PetrM »

Hello,

I would also collect and examine vsstrace and probably try to reproduce the problem by creating shadow copies manually. Maybe it's somehow related to VSS and not to VMware but it's just a hypothesis, of course.

Thanks!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Hi,

no, it cannot be connected to VSS. As i said, Veeam is only indirectly connected.

For testing we created a Veeam job without quiescing or any OS integrations. So basically a "dirty" snapshot.

Still, immediately the transaction log showed corruption in production - just because the VM ran on a snapshot!

Markus
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by PetrM »

Veeam can leverage Volume Shadow Copy Service to bring file system and application data into consistent state, it just makes a request to create a volume shadow copy, therefore the issue with VSS does not indicate that Veeam is directly involved. I was not aware about the test without quiescence but I agree that the result allows us to exclude VSS from the list of potential root causes. Just out of curiosity: do you notice the same issue if you create a snapshot manually in vSphere client?

Thanks!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@PetrM i think you still misunderstood: Our backups are fine, production data shows as corrupted several minutes after it starts to run on snapshot until snapshot is removed.

The snapshot should be transparent/not visible when not using VSS or VMware tools. The creation of the snapshot just stuns the VM for a second or so, which should never corrupt any data.

I did not try with normal snapshots because there should be no difference without any integration...
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by soncscy » 1 person likes this post

Hrm, this came up a _long_ time ago, but maybe you're experiencing this?

https://kb.vmware.com/s/article/59216

It's strange it would only appear now, but the conditions seem to match if it's truly just when the snapshot is present. Maybe VMware has a regression?
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by PetrM »

@mkretzer

In fact, I was talking exactly about production data and not about backups. The idea was to check that shadow copy creation itself does not produce errors at the application level but I agree that it's not relevant as your test without quiescence shows that the issue is not related to VSS. Anyway, I suggest to try with "manual" snapshots without involving Veeam at all, most probably you will get the same problem but any assumption (even if it's obvious) should be proven by experiment.

Thanks!
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer » 1 person likes this post

@soncscy i thougt so as well.
The interesting thing is that it does never happen for databases data files other than tempdb. It must be heavily IO-Pattern dependent.

@PetrM will do tests...
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

We are also experiencing tempeh corruption every now and then (once every 2 weeks). I never related it to backups. We are using backup from storage snapshot, so the snapshot is only present for a short amount of time. I did find it strange though that when running dbcc after I got the corruption notification mail from sql server itself, there were no inconsistencies found. Now I understand why since it wasn’t running on snapshot anymore.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@FrancWest interesting! Is the corruption also happening at the time of backup (when snapshot is active)?
Are you on VSphere 7U2?
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

Yes, more specifically during snapshot removal.

This is our complete error message:

DATE/TIME: 12-2-2022 02:21:23

DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xe01a2f41; actual: 0xe01a2f41). It occurred during a read of page (6:1039271) in database ID 2 at offset 0x000001fb74e000 in file 'E:\Data\tempdb_4.mdf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.


We are at vSphere 7u3 the re-released version.
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

We 've had the issue 3 times now. November 14th at 2:20:18, December 7th at 2:22:28 and February 12th at 2:21:23. Al around the same time during snapshot removal. That's no coincidence.
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

Also, we used this to let SQL sent notifications, so you don't need to look into the logs for the error:

https://www.sqlshack.com/sql-server-set ... se-alerts/
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Did you already open a VMware case? We are currently discussing it with them. If you open a case you can reference our case number 22307569902?

Are you using SQL 2019?
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

No we didn't open a case with vmware, since before today I didn't realize it might be related to vSphere. I assumed it had something to do with SQL and/or our storage. But both Microsoft Support and Dell/Emc didn't find any issues.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Would you be willing to open a case? I believe it would really help solve this issue!
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

I'll do that, but I'm not sure if they are going something to do about it, since the last occurrence was on Feb 12th. We'll see.
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

Oh, I forgot. We are using SQL 2017.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

They will - just remember to reference our case number.
If there are enough cases they will find something.

In our system i can reproduce it every few days, but we have alot of high load SQL servers...
Grime121
Influencer
Posts: 19
Liked: 1 time
Joined: Apr 10, 2020 6:02 pm
Full Name: Evan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Grime121 »

I noticed you said you had quiescence enabled. Are you using VM Tools quiescence in your Veeam backup jobs? That is not recommended. Veeam Guest Processing (application-aware processing) is the better option. Since you already tried disabling it, and it didn’t help, it doesn’t sound like that is your issue, but Veeam Guest Processing should nevertheless be used instead.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

No, thats irrelevant to the problem. Its a real snapshot data corruption bug as it happens with guest processing, quiescence and nothing at all (just snapshot).

Again, the production data, not backup data is corrupted while running on snapshot. "Just snapshot" should not impact operation of the VM at all!
Bjorn
Influencer
Posts: 10
Liked: 3 times
Joined: Mar 23, 2017 9:38 am
Full Name: Björn Sandell
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Bjorn » 1 person likes this post

We've been seeing this since mid October on various versions of mssql/windows (2012- 2019). I did open a case with VMware (21283732112) which basically boiled down to "snapshots are not supported on MSSQL, we can't help you"
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

What?
"snapshots are not supported on MSSQL, we can't help you"

Did you escalate?
MariusN
Influencer
Posts: 15
Liked: 1 time
Joined: Jun 12, 2017 11:21 am
Full Name: Marius Neumann
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by MariusN »

Which Version of 7.0U2 you are running? a,c,d,e ?

As there were many different problems with 7.0 U2 I personally would recommend you to upgrade to U3 which was released earlier February 22.
https://docs.vmware.com/en/VMware-vSphe ... notes.html
Maybe this will help you:
https://communities.vmware.com/t5/ESXi- ... -p/2851019
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

We are at U3c and have the same issue.

U3c came out on 2022-01-27
m.novelli
Veeam ProPartner
Posts: 504
Liked: 84 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by m.novelli »

I subscribe the thread, I'm interested. So far I've not been impacted by this issue but have many customers running on vSphere 7 with SQL Server VM

Marco
BackItUp2020
Enthusiast
Posts: 54
Liked: 3 times
Joined: Mar 24, 2020 6:36 pm
Full Name: M.S.
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by BackItUp2020 »

We have the same issue here! I've opened up cases w/ Veeam B&R but haven't been able to figure it out. Our SQL folks keep having to delete the tempDB and recreate it.
BackItUp2020
Enthusiast
Posts: 54
Liked: 3 times
Joined: Mar 24, 2020 6:36 pm
Full Name: M.S.
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by BackItUp2020 »

Ours are not running on snapshots and occurs very intermittently. 7U3.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@BackItUp2020 Does this mean the issue is not while Veeam is creating snapshots for backup? Then this might be a different issue!
Grime121
Influencer
Posts: 19
Liked: 1 time
Joined: Apr 10, 2020 6:02 pm
Full Name: Evan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Grime121 »

I only mentioned the quiescence thing since it is best practice to not use VMWare Tools for that. I didn’t mean it was possibly the cause of this problem.

As for the issue itself, we have a ton of SQL servers (2014, 2016, 2017, and 2019, all on various Windows OS versions). We have the latest version of Veeam and vCenter/ESXi installed, and have not experienced any of these issues. We use storage snapshots in one datacenter, and VMWare snapshots the other two (since they use vVol storage). No issues with any of the SQL VMs during backups, whether storage snapshots are used, or not. Again, we use Veeam guest processing rather than VMWare Tools quiescence, but that doesn’t appear to be related to your issue since changing the setting didn’t help. It’s just best practice to not use that, which is why I recommend disabling it.
Post Reply

Who is online

Users browsing this forum: No registered users and 63 guests