-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@Grime121 Did you check the logs for such events (IDs 824 or 3041)?
We also never saw any real issue, finding this was pure coincidence - it happens on all ESX 7 SQL VMs, but for some only every 3 months or so. It strongly depends on IO load, slighty loaded VMs are not affected.
We also never saw any real issue, finding this was pure coincidence - it happens on all ESX 7 SQL VMs, but for some only every 3 months or so. It strongly depends on IO load, slighty loaded VMs are not affected.
-
- Influencer
- Posts: 19
- Liked: 1 time
- Joined: Apr 10, 2020 6:02 pm
- Full Name: Evan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We have some very high I/O SQL servers, and monitor them all with SCOM. I can’t imagine us not receiving alerts via SCOM if there was an issue such as this….
The high I/O ones are all stored on Pure SANs, utilizing vVols. Maybe that’s why we don’t see any issues.
The high I/O ones are all stored on Pure SANs, utilizing vVols. Maybe that’s why we don’t see any issues.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Does SCOM monitor for these events specifically?
Did you check for these events in your application log?
Did you check for these events in your application log?
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Interesting thread. I've had a similar problem for a while with Linux/MySQL VMs. Occasionally a few minutes after backup (snap removal) MySQL crashes with data corruption error. Example
It's one of two reoccuring disk corrpution cases. One is this MySQL-based problem (various MySQL servers in different countries), the other is suspected related to old kernels and vNMVe driver bugs but you may never know...
No interesting logs nor case numbers as it's quite rare and there's very little to go on...
Code: Select all
2022-02-23T10:53:52.154057Z 1 [ERROR] [MY-011906] [InnoDB] Database page corruption on disk or a failed file read of page [page id: space=4294967279, page number=218]. You may have to recover from a backup.
2022-02-23T10:53:52.180703Z 1 [ERROR] [MY-011825] [InnoDB] [FATAL] Unable to read page [page id: space=4294967279, page number=218] into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. Or, the table was compressed with with an algorithm that is not supported by this instance. If it is not a decompress failure, you can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/8.0/en/ for more details. Aborting...
2022-02-23T10:53:52.180718Z 1 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:634 thread 140041579603712
No interesting logs nor case numbers as it's quite rare and there's very little to go on...
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@DonZoomik Did you open a VMware case? You can reference our case
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
No cases so far as it could have been blamed on faulty kernel or something...
I'll create one if it happens the next time.
I'll create one if it happens the next time.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Just trying to keep the preasure on VMware so their Dev can fix this.
This morning we had another case of log corruption. This time we verified that
a) the issue is not an one time read issue (multiple log backups failed)
b) the faulty blocks are 100 % non-faulty after snapshot has been deleted and the log files could be reads by veeam without any corruption
that most likely means the SEsparse itself is not corrupted and it might be some vmfs caching issue (?!) - i am not sure if vmfs does read caching at all....
This morning we had another case of log corruption. This time we verified that
a) the issue is not an one time read issue (multiple log backups failed)
b) the faulty blocks are 100 % non-faulty after snapshot has been deleted and the log files could be reads by veeam without any corruption
that most likely means the SEsparse itself is not corrupted and it might be some vmfs caching issue (?!) - i am not sure if vmfs does read caching at all....
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
AFAIK VMware by default does zero IO caching.
Send me the VMware case in PM if you don't want to publish it publicly. Although I'd not expect much from VMware with such a hard to catch bug with their support going constantly downhill in quality...
Send me the VMware case in PM if you don't want to publish it publicly. Although I'd not expect much from VMware with such a hard to catch bug with their support going constantly downhill in quality...
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Case 22307569902
I have a good feeling, our support engineer is closely following this thread and is reporting to Dev...
I have a good feeling, our support engineer is closely following this thread and is reporting to Dev...
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Mar 23, 2022 5:24 pm
- Full Name: dgalmo
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We also have this issue. If another case is needed I can also submit one to VMware. So far we've only had one occurrence. VMware 7.3uc, Veeam 11.0.1.1261.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@dgalmarini@ulmer.com Please do so and reference my case (22307569902), thank you!
-
- Influencer
- Posts: 19
- Liked: 1 time
- Joined: Apr 10, 2020 6:02 pm
- Full Name: Evan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
BTW, our logs do not go back far enough for me to determine if we have seen these errors over the past month. They only go back a few days, due to some log flooding that we are still trying to get sorted out. I would like to think that SCOM would alert on these errors, though…. While we do not have an explicitly defined rule for alerting on these errors, we do have the SQL Management Packs installed, and configured. We receive alerts from SCOM for just about every issue with SQL that you could imagine. I might see if our Ops Team can look into if these particular events are being monitored for, though. Just to be sure.
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Mar 28, 2022 9:58 pm
- Full Name: Todd Bernhardt
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Had the same issue today on a SQL 2016 OS 2016 instance (vSphere Client version 7.0.3.00300). Timing was consistent with a Veeam app aware backup. Subscribed to this topic.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@MT_Todd please open a VMware case as well and reference our case (22307569902)
-
- Lurker
- Posts: 2
- Liked: 1 time
- Joined: Mar 30, 2022 3:11 pm
- Full Name: Yvonne Murphy
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I work in VMware support. We have a problem report open for this. Are all these snapshots taken without memory and without quiescing? Are you sure your VMware tools are up to date? I'm hoping to collect as as many possible cases on this so please when you open the case if you think you might have this issue. Yvonne Murphy VMware support Cork.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello Yvonne,
without memory yes, quiescing is irrelevant, i can reproduce with or without. Tools are also not relevant, the inconsitency also happens without any quiescing at all which means the OS does not know it is running on snapshot at all...
I am currently testing a setting which our engineer suggested, until now we had no new occurrence. But we are not allowed to disclose the setting here as it is still all a theory...
I'll keep you all updated...
Markus
without memory yes, quiescing is irrelevant, i can reproduce with or without. Tools are also not relevant, the inconsitency also happens without any quiescing at all which means the OS does not know it is running on snapshot at all...
I am currently testing a setting which our engineer suggested, until now we had no new occurrence. But we are not allowed to disclose the setting here as it is still all a theory...
I'll keep you all updated...
Markus
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
In my Linux case, no memory snap, no quiescing (Veeam VADP default, with integration scripts). Running open-vm-tools.
It hasn't happenend in a while as backups were disabled on primary DB cluster nodes and it does not happen on passive nodes (backup only so effectively write-only, no client reads).
It hasn't happenend in a while as backups were disabled on primary DB cluster nodes and it does not happen on passive nodes (backup only so effectively write-only, no client reads).
-
- Lurker
- Posts: 2
- Liked: 1 time
- Joined: Mar 30, 2022 3:11 pm
- Full Name: Yvonne Murphy
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We're currently suggesting disabling bloom filters to stop this happening.
Bloom filters adds enhancement in SeSparse read I/O path. This enhancement is enabled only for first level snapshot. It gets disabled when multi-level snapshots are created for a given vdisk. When bloom filter is disabled read I/O performance would be equivalent to that before the enhancement.
Bloom filters can be disabled by this command:
vsish -e set /config/SE/intOpts/BFEnabled 0
So far with the cases attached to this bug report, this issue hasn't been reproduced with bloom filters disabled.
Please try this and let us know if the issue still occurs.
Bloom filters adds enhancement in SeSparse read I/O path. This enhancement is enabled only for first level snapshot. It gets disabled when multi-level snapshots are created for a given vdisk. When bloom filter is disabled read I/O performance would be equivalent to that before the enhancement.
Bloom filters can be disabled by this command:
vsish -e set /config/SE/intOpts/BFEnabled 0
So far with the cases attached to this bug report, this issue hasn't been reproduced with bloom filters disabled.
Please try this and let us know if the issue still occurs.
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Do the hosts need to be rebooted after issuing the command? Can this command also be set from vcenter using an advanced configuration option? It's quite a hassle to enable SSH or ESXi shell on each host and go by them one by one.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello,
we did not have to do reboots or maitenance mode, we just did it in production directly via SSH.
I do not know about VCenter as we only have 3 hosts on VSphere 7.
Markus
we did not have to do reboots or maitenance mode, we just did it in production directly via SSH.
I do not know about VCenter as we only have 3 hosts on VSphere 7.
Markus
-
- Enthusiast
- Posts: 59
- Liked: 3 times
- Joined: Mar 24, 2020 6:36 pm
- Full Name: M.S.
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
I am going to open a VMware case on this as well and hopefully get the same direction. Our SQL admin has to stop and restart the SQL services to recreate the TempDB due to corruption.
Errors 823/824
DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x930f76cd; actual: 0x930f76cd). It occurred during a read of page (4:522757) in database ID 2 at offset 0x000000ff40a000 in file 'E:\TempDB\temp3.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Errors 823/824
DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x930f76cd; actual: 0x930f76cd). It occurred during a read of page (4:522757) in database ID 2 at offset 0x000000ff40a000 in file 'E:\TempDB\temp3.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Apr 06, 2022 5:11 pm
- Full Name: Ryan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
My team is seeing the same issue.
We're on VMware ESXi, 7.0.3, 19193900 / Veeam 11.0.1.1261 P220220302.
Seems to happen when the SQL server is under load. We've been banging our heads against the wall. We even built a fresh Server 2019 / SQL 2019 and migrated to ensure it was not something in the OS or an issue with the SQL install.
We also involved Microsoft, but they are useless.
These errors are correlated with database locks that are totally unexplained and only happen when we see these errors. We are having to clear them manually and it's becoming a nightmare.
Subscribed here to see if anyone comes up with a fix.
We're on VMware ESXi, 7.0.3, 19193900 / Veeam 11.0.1.1261 P220220302.
Seems to happen when the SQL server is under load. We've been banging our heads against the wall. We even built a fresh Server 2019 / SQL 2019 and migrated to ensure it was not something in the OS or an issue with the SQL install.
We also involved Microsoft, but they are useless.
These errors are correlated with database locks that are totally unexplained and only happen when we see these errors. We are having to clear them manually and it's becoming a nightmare.
Subscribed here to see if anyone comes up with a fix.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hi,
recreate? Why?
If its the same issue the inconsistencies are gone after snapshot removal!
DBCC Checkdb not once showed an inconsistency after the snapshot was gone.
You might have a different issue if that is not the case!
Markus
recreate? Why?
If its the same issue the inconsistencies are gone after snapshot removal!
DBCC Checkdb not once showed an inconsistency after the snapshot was gone.
You might have a different issue if that is not the case!
Markus
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We were also advised by VMware support to disable the bloom filters. What I still don’t understand though is why the issue in our case only seems to occur on tempdb and not the other databases.
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
@FrancWest Because it seems to occour only on one type of IO which for us (on all servers) is tempdb data files and user database transaction log files. For us the tempdb issue is also much more frequent than the log issue.
This seems to be universal for all users having this issue from what i heard.
This seems to be universal for all users having this issue from what i heard.
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello!
@FrancWest Could you please clarify about these bloom filters, where exactly did you disable it and does the issue persist?
Thanks!
@FrancWest Could you please clarify about these bloom filters, where exactly did you disable it and does the issue persist?
Thanks!
-
- Veteran
- Posts: 528
- Liked: 104 times
- Joined: Sep 17, 2017 3:20 am
- Full Name: Franc
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hi,
see 9 posts above this one by @yvonnevm. If the issue is resolved we don't know yet, since it is very intermittent. Only 3 times in the last 4 months.
post449277.html#p449277
see 9 posts above this one by @yvonnevm. If the issue is resolved we don't know yet, since it is very intermittent. Only 3 times in the last 4 months.
post449277.html#p449277
-
- Veeam Legend
- Posts: 1203
- Liked: 417 times
- Joined: Dec 17, 2015 7:17 am
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
For us we had not one issue since we set it! Since we had it 1-2 times a week in the whole environment it feels like its fixed...
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Apr 06, 2022 5:11 pm
- Full Name: Ryan
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
We disabled bloom filters and the problem went away.
-
- Enthusiast
- Posts: 32
- Liked: 2 times
- Joined: Dec 14, 2017 1:49 pm
- Full Name: Ioannis Tsitsiklis
- Contact:
Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches
Hello,
@yvonnevm @FrancWest
Although we are not currently affected , as far as we could check by this, following this article https://kb.vmware.com/s/article/83550 supposedly , Bloom filter is doing exactly this. To help improve read speed/IO
quoted from the article
This is kind of conflicted regarding why Bloom filter exists and why we should disabled it. Or under which case should we disable it?
Thanks
@yvonnevm @FrancWest
Although we are not currently affected , as far as we could check by this, following this article https://kb.vmware.com/s/article/83550 supposedly , Bloom filter is doing exactly this. To help improve read speed/IO
quoted from the article
So the question that needs to be answered , is should we , the end customers, that are using VMware ESXi should move on and disable Bloom filter or leave it as is?The enhancement described here aims at improving read performance when VM is on SE sparse snapshot. This is done by using a probabilistic data structure like Bloom Filter and is targeted to optimize read work flow especially for first level snapshot.
This is kind of conflicted regarding why Bloom filter exists and why we should disabled it. Or under which case should we disable it?
Thanks
Who is online
Users browsing this forum: Majestic-12 [Bot] and 48 guests