Host-based backup of VMware vSphere VMs.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@Grime121 Did you check the logs for such events (IDs 824 or 3041)?
We also never saw any real issue, finding this was pure coincidence - it happens on all ESX 7 SQL VMs, but for some only every 3 months or so. It strongly depends on IO load, slighty loaded VMs are not affected.
Grime121
Influencer
Posts: 19
Liked: 1 time
Joined: Apr 10, 2020 6:02 pm
Full Name: Evan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Grime121 »

We have some very high I/O SQL servers, and monitor them all with SCOM. I can’t imagine us not receiving alerts via SCOM if there was an issue such as this….

The high I/O ones are all stored on Pure SANs, utilizing vVols. Maybe that’s why we don’t see any issues.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Does SCOM monitor for these events specifically?

Did you check for these events in your application log?
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by DonZoomik »

Interesting thread. I've had a similar problem for a while with Linux/MySQL VMs. Occasionally a few minutes after backup (snap removal) MySQL crashes with data corruption error. Example

Code: Select all

2022-02-23T10:53:52.154057Z 1 [ERROR] [MY-011906] [InnoDB] Database page corruption on disk or a failed file read of page [page id: space=4294967279, page number=218]. You may have to recover from a backup.
2022-02-23T10:53:52.180703Z 1 [ERROR] [MY-011825] [InnoDB] [FATAL] Unable to read page [page id: space=4294967279, page number=218] into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. Or, the table was compressed with with an algorithm that is not supported by this instance. If it is not a decompress failure, you can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/8.0/en/ for more details. Aborting...
2022-02-23T10:53:52.180718Z 1 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:634 thread 140041579603712
It's one of two reoccuring disk corrpution cases. One is this MySQL-based problem (various MySQL servers in different countries), the other is suspected related to old kernels and vNMVe driver bugs but you may never know...
No interesting logs nor case numbers as it's quite rare and there's very little to go on...
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@DonZoomik Did you open a VMware case? You can reference our case :-)
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by DonZoomik »

No cases so far as it could have been blamed on faulty kernel or something...
I'll create one if it happens the next time.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Just trying to keep the preasure on VMware so their Dev can fix this.

This morning we had another case of log corruption. This time we verified that

a) the issue is not an one time read issue (multiple log backups failed)
b) the faulty blocks are 100 % non-faulty after snapshot has been deleted and the log files could be reads by veeam without any corruption

that most likely means the SEsparse itself is not corrupted and it might be some vmfs caching issue (?!) - i am not sure if vmfs does read caching at all....
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by DonZoomik »

AFAIK VMware by default does zero IO caching.
Send me the VMware case in PM if you don't want to publish it publicly. Although I'd not expect much from VMware with such a hard to catch bug with their support going constantly downhill in quality...
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Case 22307569902
I have a good feeling, our support engineer is closely following this thread and is reporting to Dev...
dgalmarini@ulmer.com
Lurker
Posts: 1
Liked: never
Joined: Mar 23, 2022 5:24 pm
Full Name: dgalmo
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by dgalmarini@ulmer.com »

We also have this issue. If another case is needed I can also submit one to VMware. So far we've only had one occurrence. VMware 7.3uc, Veeam 11.0.1.1261.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer » 1 person likes this post

@dgalmarini@ulmer.com Please do so and reference my case (22307569902), thank you!
Grime121
Influencer
Posts: 19
Liked: 1 time
Joined: Apr 10, 2020 6:02 pm
Full Name: Evan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Grime121 »

BTW, our logs do not go back far enough for me to determine if we have seen these errors over the past month. They only go back a few days, due to some log flooding that we are still trying to get sorted out. I would like to think that SCOM would alert on these errors, though…. While we do not have an explicitly defined rule for alerting on these errors, we do have the SQL Management Packs installed, and configured. We receive alerts from SCOM for just about every issue with SQL that you could imagine. I might see if our Ops Team can look into if these particular events are being monitored for, though. Just to be sure.
MT_Todd
Lurker
Posts: 1
Liked: never
Joined: Mar 28, 2022 9:58 pm
Full Name: Todd Bernhardt
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by MT_Todd »

Had the same issue today on a SQL 2016 OS 2016 instance (vSphere Client version 7.0.3.00300). Timing was consistent with a Veeam app aware backup. Subscribed to this topic.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@MT_Todd please open a VMware case as well and reference our case (22307569902)
yvonnevm
Lurker
Posts: 2
Liked: 1 time
Joined: Mar 30, 2022 3:11 pm
Full Name: Yvonne Murphy
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by yvonnevm »

I work in VMware support. We have a problem report open for this. Are all these snapshots taken without memory and without quiescing? Are you sure your VMware tools are up to date? I'm hoping to collect as as many possible cases on this so please when you open the case if you think you might have this issue. Yvonne Murphy VMware support Cork.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Hello Yvonne,

without memory yes, quiescing is irrelevant, i can reproduce with or without. Tools are also not relevant, the inconsitency also happens without any quiescing at all which means the OS does not know it is running on snapshot at all...

I am currently testing a setting which our engineer suggested, until now we had no new occurrence. But we are not allowed to disclose the setting here as it is still all a theory...

I'll keep you all updated...

Markus
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by DonZoomik »

In my Linux case, no memory snap, no quiescing (Veeam VADP default, with integration scripts). Running open-vm-tools.
It hasn't happenend in a while as backups were disabled on primary DB cluster nodes and it does not happen on passive nodes (backup only so effectively write-only, no client reads).
yvonnevm
Lurker
Posts: 2
Liked: 1 time
Joined: Mar 30, 2022 3:11 pm
Full Name: Yvonne Murphy
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by yvonnevm » 1 person likes this post

We're currently suggesting disabling bloom filters to stop this happening.

Bloom filters adds enhancement in SeSparse read I/O path. This enhancement is enabled only for first level snapshot. It gets disabled when multi-level snapshots are created for a given vdisk. When bloom filter is disabled read I/O performance would be equivalent to that before the enhancement.

Bloom filters can be disabled by this command:

vsish -e set /config/SE/intOpts/BFEnabled 0

So far with the cases attached to this bug report, this issue hasn't been reproduced with bloom filters disabled.
Please try this and let us know if the issue still occurs.
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

Do the hosts need to be rebooted after issuing the command? Can this command also be set from vcenter using an advanced configuration option? It's quite a hassle to enable SSH or ESXi shell on each host and go by them one by one.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Hello,

we did not have to do reboots or maitenance mode, we just did it in production directly via SSH.

I do not know about VCenter as we only have 3 hosts on VSphere 7.

Markus
BackItUp2020
Enthusiast
Posts: 54
Liked: 3 times
Joined: Mar 24, 2020 6:36 pm
Full Name: M.S.
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by BackItUp2020 »

I am going to open a VMware case on this as well and hopefully get the same direction. Our SQL admin has to stop and restart the SQL services to recreate the TempDB due to corruption.

Errors 823/824

DESCRIPTION: SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x930f76cd; actual: 0x930f76cd). It occurred during a read of page (4:522757) in database ID 2 at offset 0x000000ff40a000 in file 'E:\TempDB\temp3.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
RyanIT
Lurker
Posts: 2
Liked: never
Joined: Apr 06, 2022 5:11 pm
Full Name: Ryan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by RyanIT »

My team is seeing the same issue.

We're on VMware ESXi, 7.0.3, 19193900 / Veeam 11.0.1.1261 P220220302.

Seems to happen when the SQL server is under load. We've been banging our heads against the wall. We even built a fresh Server 2019 / SQL 2019 and migrated to ensure it was not something in the OS or an issue with the SQL install.

We also involved Microsoft, but they are useless.

These errors are correlated with database locks that are totally unexplained and only happen when we see these errors. We are having to clear them manually and it's becoming a nightmare.

Subscribed here to see if anyone comes up with a fix.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

Hi,

recreate? Why?
If its the same issue the inconsistencies are gone after snapshot removal!

DBCC Checkdb not once showed an inconsistency after the snapshot was gone.

You might have a different issue if that is not the case!

Markus
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest »

We were also advised by VMware support to disable the bloom filters. What I still don’t understand though is why the issue in our case only seems to occur on tempdb and not the other databases.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

@FrancWest Because it seems to occour only on one type of IO which for us (on all servers) is tempdb data files and user database transaction log files. For us the tempdb issue is also much more frequent than the log issue.

This seems to be universal for all users having this issue from what i heard.
PetrM
Veeam Software
Posts: 3229
Liked: 519 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by PetrM »

Hello!

@FrancWest Could you please clarify about these bloom filters, where exactly did you disable it and does the issue persist?

Thanks!
FrancWest
Veteran
Posts: 488
Liked: 93 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by FrancWest » 1 person likes this post

Hi,

see 9 posts above this one by @yvonnevm. If the issue is resolved we don't know yet, since it is very intermittent. Only 3 times in the last 4 months.

post449277.html#p449277
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by mkretzer »

For us we had not one issue since we set it! Since we had it 1-2 times a week in the whole environment it feels like its fixed...
RyanIT
Lurker
Posts: 2
Liked: never
Joined: Apr 06, 2022 5:11 pm
Full Name: Ryan
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by RyanIT »

We disabled bloom filters and the problem went away.
Ioannis.T
Enthusiast
Posts: 32
Liked: 2 times
Joined: Dec 14, 2017 1:49 pm
Full Name: Ioannis Tsitsiklis
Contact:

Re: SQL Server data inconsistencies with snapshot - ESX 7U2, latest patches

Post by Ioannis.T »

Hello,

@yvonnevm @FrancWest
Although we are not currently affected , as far as we could check by this, following this article https://kb.vmware.com/s/article/83550 supposedly , Bloom filter is doing exactly this. To help improve read speed/IO
quoted from the article
The enhancement described here aims at improving read performance when VM is on SE sparse snapshot. This is done by using a probabilistic data structure like Bloom Filter and is targeted to optimize read work flow especially for first level snapshot.
So the question that needs to be answered , is should we , the end customers, that are using VMware ESXi should move on and disable Bloom filter or leave it as is?
This is kind of conflicted regarding why Bloom filter exists and why we should disabled it. Or under which case should we disable it?

Thanks
Post Reply

Who is online

Users browsing this forum: Lewpy, nvdwansem and 77 guests