-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello,
not directly connected to Veeam - still:
We are experiencing data inconsistencies on multiple SQL 2019 systems while the VMs are running on Snapshot (for example in the middle of backup). We could trace the start of the issues to our upgrade to ESX 7U2.
The issues are just in:
- tempdb data files (not always the same data file, but always tempdb) - SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x76484a82; actual: 0x21fa7e9a). It occurred during a read of page (6:2595847) in database ID 2 at offset 0x000004f380e000 in file 'T:\SQL_DATA\tempdbX.ndf'
- User database logfiles in several databases - Backup detected log corruption in database TESTDB. Context is Bad Middle Sector. LogFile: 2 'K:\SQL_LOG\TESTDB\TESTDB.ldf' VLF SeqNo: x1c7aae VLFBase: x2470000 LogBlockOffset: x24fd000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x1c7aae LogBlock.StartLsn.Blk: x416 Size: xe400 PrevSize: xe400
We did dbcc checkdb with extended logical checks - not one database has inconsistencies after the snapshot is removed.
The interesting thing is that we have about 40-50 different inconsistencies in > 10 DB servers over the last ~6 months - and they are *ALWAYS* in these two locations (random database log files or tempdb data file). VMware is checking at the moment... Storage vendor already validated that the data on all storage mirrors (RAID + sync mirror + checksum for every block) is identical, so it must come from a higher layer.
Does anyone face a similar issues?
We would not have seen the issues if we would not have checked logs / set log backup to "every 5 minutes" - Veeam then found a log corruption fast enough while backup was running.
If you search for "corr" in the sql server event log you find the issues quite easily.
Markus
not directly connected to Veeam - still:
We are experiencing data inconsistencies on multiple SQL 2019 systems while the VMs are running on Snapshot (for example in the middle of backup). We could trace the start of the issues to our upgrade to ESX 7U2.
The issues are just in:
- tempdb data files (not always the same data file, but always tempdb) - SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x76484a82; actual: 0x21fa7e9a). It occurred during a read of page (6:2595847) in database ID 2 at offset 0x000004f380e000 in file 'T:\SQL_DATA\tempdbX.ndf'
- User database logfiles in several databases - Backup detected log corruption in database TESTDB. Context is Bad Middle Sector. LogFile: 2 'K:\SQL_LOG\TESTDB\TESTDB.ldf' VLF SeqNo: x1c7aae VLFBase: x2470000 LogBlockOffset: x24fd000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x1c7aae LogBlock.StartLsn.Blk: x416 Size: xe400 PrevSize: xe400
We did dbcc checkdb with extended logical checks - not one database has inconsistencies after the snapshot is removed.
The interesting thing is that we have about 40-50 different inconsistencies in > 10 DB servers over the last ~6 months - and they are *ALWAYS* in these two locations (random database log files or tempdb data file). VMware is checking at the moment... Storage vendor already validated that the data on all storage mirrors (RAID + sync mirror + checksum for every block) is identical, so it must come from a higher layer.
Does anyone face a similar issues?
We would not have seen the issues if we would not have checked logs / set log backup to "every 5 minutes" - Veeam then found a log corruption fast enough while backup was running.
If you search for "corr" in the sql server event log you find the issues quite easily.
Markus
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello,
I would also collect and examine vsstrace and probably try to reproduce the problem by creating shadow copies manually. Maybe it's somehow related to VSS and not to VMware but it's just a hypothesis, of course.
Thanks!
I would also collect and examine vsstrace and probably try to reproduce the problem by creating shadow copies manually. Maybe it's somehow related to VSS and not to VMware but it's just a hypothesis, of course.
Thanks!
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hi,
no, it cannot be connected to VSS. As i said, Veeam is only indirectly connected.
For testing we created a Veeam job without quiescing or any OS integrations. So basically a "dirty" snapshot.
Still, immediately the transaction log showed corruption in production - just because the VM ran on a snapshot!
Markus
no, it cannot be connected to VSS. As i said, Veeam is only indirectly connected.
For testing we created a Veeam job without quiescing or any OS integrations. So basically a "dirty" snapshot.
Still, immediately the transaction log showed corruption in production - just because the VM ran on a snapshot!
Markus
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Veeam can leverage Volume Shadow Copy Service to bring file system and application data into consistent state, it just makes a request to create a volume shadow copy, therefore the issue with VSS does not indicate that Veeam is directly involved. I was not aware about the test without quiescence but I agree that the result allows us to exclude VSS from the list of potential root causes. Just out of curiosity: do you notice the same issue if you create a snapshot manually in vSphere client?
Thanks!
Thanks!
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@PetrM i think you still misunderstood: Our backups are fine, production data shows as corrupted several minutes after it starts to run on snapshot until snapshot is removed.
The snapshot should be transparent/not visible when not using VSS or VMware tools. The creation of the snapshot just stuns the VM for a second or so, which should never corrupt any data.
I did not try with normal snapshots because there should be no difference without any integration...
The snapshot should be transparent/not visible when not using VSS or VMware tools. The creation of the snapshot just stuns the VM for a second or so, which should never corrupt any data.
I did not try with normal snapshots because there should be no difference without any integration...
-
- Veteran
- Posts: 643
- Liked: 312 times
- Joined: Aug 04, 2019 2:57 pm
- Full Name: Harvey
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hrm, this came up a _long_ time ago, but maybe you're experiencing this?
https://kb.vmware.com/s/article/59216
It's strange it would only appear now, but the conditions seem to match if it's truly just when the snapshot is present. Maybe VMware has a regression?
https://kb.vmware.com/s/article/59216
It's strange it would only appear now, but the conditions seem to match if it's truly just when the snapshot is present. Maybe VMware has a regression?
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@mkretzer
In fact, I was talking exactly about production data and not about backups. The idea was to check that shadow copy creation itself does not produce errors at the application level but I agree that it's not relevant as your test without quiescence shows that the issue is not related to VSS. Anyway, I suggest to try with "manual" snapshots without involving Veeam at all, most probably you will get the same problem but any assumption (even if it's obvious) should be proven by experiment.
Thanks!
In fact, I was talking exactly about production data and not about backups. The idea was to check that shadow copy creation itself does not produce errors at the application level but I agree that it's not relevant as your test without quiescence shows that the issue is not related to VSS. Anyway, I suggest to try with "manual" snapshots without involving Veeam at all, most probably you will get the same problem but any assumption (even if it's obvious) should be proven by experiment.
Thanks!
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@soncscy i thougt so as well.
The interesting thing is that it does never happen for databases data files other than tempdb. It must be heavily IO-Pattern dependent.
@PetrM will do tests...
The interesting thing is that it does never happen for databases data files other than tempdb. It must be heavily IO-Pattern dependent.
@PetrM will do tests...
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We are also experiencing tempeh corruption every now and then (once every 2 weeks). I never related it to backups. We are using backup from storage snapshot, so the snapshot is only present for a short amount of time. I did find it strange though that when running dbcc after I got the corruption notification mail from sql server itself, there were no inconsistencies found. Now I understand why since it wasn’t running on snapshot anymore.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@FrancWest interesting! Is the corruption also happening at the time of backup (when snapshot is active)?
Are you on VSphere 7U2?
Are you on VSphere 7U2?
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Yes, more specifically during snapshot removal.
This is our complete error message:
DATE/TIME: 12-2-2022 02:21:23
DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xe01a2f41; actual: 0xe01a2f41). It occurred during a read of page (6:1039271) in database ID 2 at offset 0x000001fb74e000 in file 'E:\Data\tempdb_4.mdf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
We are at vSphere 7u3 the re-released version.
This is our complete error message:
DATE/TIME: 12-2-2022 02:21:23
DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xe01a2f41; actual: 0xe01a2f41). It occurred during a read of page (6:1039271) in database ID 2 at offset 0x000001fb74e000 in file 'E:\Data\tempdb_4.mdf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
We are at vSphere 7u3 the re-released version.
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We 've had the issue 3 times now. November 14th at 2:20:18, December 7th at 2:22:28 and February 12th at 2:21:23. Al around the same time during snapshot removal. That's no coincidence.
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Also, we used this to let SQL sent notifications, so you don't need to look into the logs for the error:
https://www.sqlshack.com/sql-server-set ... se-alerts/
https://www.sqlshack.com/sql-server-set ... se-alerts/
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Did you already open a VMware case? We are currently discussing it with them. If you open a case you can reference our case number 22307569902?
Are you using SQL 2019?
Are you using SQL 2019?
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
No we didn't open a case with vmware, since before today I didn't realize it might be related to vSphere. I assumed it had something to do with SQL and/or our storage. But both Microsoft Support and Dell/Emc didn't find any issues.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Would you be willing to open a case? I believe it would really help solve this issue!
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I'll do that, but I'm not sure if they are going something to do about it, since the last occurrence was on Feb 12th. We'll see.
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Oh, I forgot. We are using SQL 2017.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
They will - just remember to reference our case number.
If there are enough cases they will find something.
In our system i can reproduce it every few days, but we have alot of high load SQL servers...
If there are enough cases they will find something.
In our system i can reproduce it every few days, but we have alot of high load SQL servers...
-
- Influencer
- Posts: 19
- Liked: 1 time
- Joined: Apr 10, 2020 6:02 pm
- Full Name: Evan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I noticed you said you had quiescence enabled. Are you using VM Tools quiescence in your Veeam backup jobs? That is not recommended. Veeam Guest Processing (application-aware processing) is the better option. Since you already tried disabling it, and it didn’t help, it doesn’t sound like that is your issue, but Veeam Guest Processing should nevertheless be used instead.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
No, thats irrelevant to the problem. Its a real snapshot data corruption bug as it happens with guest processing, quiescence and nothing at all (just snapshot).
Again, the production data, not backup data is corrupted while running on snapshot. "Just snapshot" should not impact operation of the VM at all!
Again, the production data, not backup data is corrupted while running on snapshot. "Just snapshot" should not impact operation of the VM at all!
-
- Influencer
- Posts: 10
- Liked: 3 times
- Joined: Mar 23, 2017 9:38 am
- Full Name: Björn Sandell
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We've been seeing this since mid October on various versions of mssql/windows (2012- 2019). I did open a case with VMware (21283732112) which basically boiled down to "snapshots are not supported on MSSQL, we can't help you"
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
What?
"snapshots are not supported on MSSQL, we can't help you"
Did you escalate?
"snapshots are not supported on MSSQL, we can't help you"
Did you escalate?
-
- Influencer
- Posts: 15
- Liked: 1 time
- Joined: Jun 12, 2017 11:21 am
- Full Name: Marius Neumann
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Which Version of 7.0U2 you are running? a,c,d,e ?
As there were many different problems with 7.0 U2 I personally would recommend you to upgrade to U3 which was released earlier February 22.
https://docs.vmware.com/en/VMware-vSphe ... notes.html
Maybe this will help you:
https://communities.vmware.com/t5/ESXi- ... -p/2851019
As there were many different problems with 7.0 U2 I personally would recommend you to upgrade to U3 which was released earlier February 22.
https://docs.vmware.com/en/VMware-vSphe ... notes.html
Maybe this will help you:
https://communities.vmware.com/t5/ESXi- ... -p/2851019
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We are at U3c and have the same issue.
U3c came out on 2022-01-27
U3c came out on 2022-01-27
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I subscribe the thread, I'm interested. So far I've not been impacted by this issue but have many customers running on vSphere 7 with SQL Server VM
Marco
Marco
-
- Enthusiast
- Posts: 59
- Liked: 3 times
- Joined: Mar 24, 2020 6:36 pm
- Full Name: M.S.
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We have the same issue here! I've opened up cases w/ Veeam B&R but haven't been able to figure it out. Our SQL folks keep having to delete the tempDB and recreate it.
-
- Enthusiast
- Posts: 59
- Liked: 3 times
- Joined: Mar 24, 2020 6:36 pm
- Full Name: M.S.
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Ours are not running on snapshots and occurs very intermittently. 7U3.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@BackItUp2020 Does this mean the issue is not while Veeam is creating snapshots for backup? Then this might be a different issue!
-
- Influencer
- Posts: 19
- Liked: 1 time
- Joined: Apr 10, 2020 6:02 pm
- Full Name: Evan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I only mentioned the quiescence thing since it is best practice to not use VMWare Tools for that. I didn’t mean it was possibly the cause of this problem.
As for the issue itself, we have a ton of SQL servers (2014, 2016, 2017, and 2019, all on various Windows OS versions). We have the latest version of Veeam and vCenter/ESXi installed, and have not experienced any of these issues. We use storage snapshots in one datacenter, and VMWare snapshots the other two (since they use vVol storage). No issues with any of the SQL VMs during backups, whether storage snapshots are used, or not. Again, we use Veeam guest processing rather than VMWare Tools quiescence, but that doesn’t appear to be related to your issue since changing the setting didn’t help. It’s just best practice to not use that, which is why I recommend disabling it.
As for the issue itself, we have a ton of SQL servers (2014, 2016, 2017, and 2019, all on various Windows OS versions). We have the latest version of Veeam and vCenter/ESXi installed, and have not experienced any of these issues. We use storage snapshots in one datacenter, and VMWare snapshots the other two (since they use vVol storage). No issues with any of the SQL VMs during backups, whether storage snapshots are used, or not. Again, we use Veeam guest processing rather than VMWare Tools quiescence, but that doesn’t appear to be related to your issue since changing the setting didn’t help. It’s just best practice to not use that, which is why I recommend disabling it.
Who is online
Users browsing this forum: Bing [Bot] and 32 guests