Monitoring and reporting for Veeam Backup & Replication, VMware vSphere and Microsoft Hyper-V
Post Reply
Nail.Mukhametshin
Enthusiast
Posts: 50
Liked: 3 times
Joined: May 05, 2016 1:15 pm
Full Name: Наиль Мухаметшин
Location: Russia, Moscow
Contact:

Bug - Memory leak in LSAISO and LSASS

Post by Nail.Mukhametshin » Apr 08, 2019 10:26 am

Hi all,

We had a case with MS 118112219374027 where we tried to troubleshoot Memory leak in LSAISO and LSASS when Credential Guard is enabled. MS Team reported that this is issue in VeeamDCS.exe - incorrect use (and reuse) of Credential Handles. There is answer from MS Team:
We’ve been analysing the data provided, based on our prior analysis where we identified a high amount of calls to AcquireCredentialsHandle being triggered by VeeamDCS.exe.

The time travel debug trace from the VeeamDCS.exe process shows 639 calls to AcquireCredentialsHandle (ACH) call in 251 seconds.

These calls occur in stacks where VeeamDataProviderHyperV!HvMonitor.InventoryBuilder.buildHostInventory is for instance iterating through hosts and their VMs, and seems retrieving VHDs/snapshots, which leads to WMI calls that finally call into ACH.

Code: Select all

0:045> !clrstack2
DbgId ThreadId Apartment Kind CLR          GC Mode    GC Suspending?
   45     191c MTA            v4.7.3260.00 Preemptive no

SP               IP               Function
0000009eba3fe758 0000000000000000 InlinedCallFrame 
0000009eba3fe758 0000000000000000 InlinedCallFrame 
0000009eba3fe700 00007ff94574df0d DomainBoundILStubClass.IL_STUB_PInvoke(System.Management.IEnumWbemClassObject ByRef, Int32, Int32, System.Management.IEnumWbemClassObject, System.String, IntPtr, System.String) 
0000009eba3fe840 00007ff99da7a12e System.Management.SecuredIEnumWbemClassObjectHandler.Clone_(System.Management.IEnumWbemClassObject ByRef)
0000009eba3fe8d0 00007ff99da4077a System.Management.ManagementObjectCollection.GetEnumerator()
0000009eba3fe930 00007ff9463750e6 HvMonitor.VirtualHardDisk.RetrieveSnapshotFiles(System.String, LoadHelper.WMIInvoker) 
0000009eba3fe9b0 00007ff945763327 HvMonitor.HvVirtualMachine.updateVhdInfo(System.Management.ManagementObject, HvMonitor.Service, LoadHelper.WmiCache) 
0000009eba3fea60 00007ff94575d36e HvMonitor.HvVirtualMachine.UpdateProperties(System.Management.ManagementObject, HvMonitor.Service, LoadHelper.WmiCache) 
0000009eba3feab0 00007ff94575d1d0 HvMonitor.InventoryHelper.UpdateVmProperties(HvMonitor.HvVirtualMachine, System.Management.ManagementObject, HvMonitor.Service, LoadHelper.WmiCache) 
0000009eba3feb00 00007ff94575aa35 HvMonitor.InventoryBuilder.buildHostInventory(System.String, LoadHelper.WmiCache, VeeamTypesLib.IMoRef, HvMonitor.HvHost) 
0000009eba3fec00 00007ff945759150 HvMonitor.InventoryBuilder+<>c__DisplayClass10_1.<buildClusterInventory>b__1() 
0000009eba3fecd0 00007ff9a3843a63 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
0000009eba3feda0 00007ff9a38438f4 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
0000009eba3fedd0 00007ff9a38438c2 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0000009eba3fee20 00007ff9a383d072 System.Threading.ThreadHelper.ThreadStart() 
0000009eba3ff078 0000000000000000 GCFrame 
0000009eba3ff3c8 0000000000000000 DebuggerU2MCatchHandlerFrame 
Dumps from the same process show more than 26,000 objects of type System.Management.ManagementObject, so the iteration is going to take time and lead to memory consumption increase:

Code: Select all

              MT    Count    TotalSize Class Name
00007ff99d9c7088    26527      3183240 System.Management.ManagementObject
Total 26527 objects
Checking references:

Code: Select all

0:000> !gcroot 000001cca7780808
Thread e60:
    0000009dc02fecd0 00007ff945755f8b HvMonitor.InventoryBuilder.Build(Int32)
        rbp+10: 0000009dc02fed60
            ->  000001cc9dc80760 HvMonitor.InventoryBuilder
            ->  000001cc9dc72e10 HvMonitor.Service
            ->  000001cc9dc80228 HvMonitor.EntityManager
            ->  000001cc9dc80838 HvMonitor.PropertyRetriever
            ->  000001cc9dc800e8 LoadHelper.VmGuestInfoLoader
            ->  000001cc9dc80138 System.Collections.Generic.Dictionary`2[[LoadHelper.VmGuestInfoLoader+VmPropertyId, VeeamDataLoader],[LoadHelper.VmGuestInfoLoader+LoadInfo, VeeamDataLoader]]
            ->  000001cc9febfa48 System.Collections.Generic.Dictionary`2+Entry[[LoadHelper.VmGuestInfoLoader+VmPropertyId, VeeamDataLoader],[LoadHelper.VmGuestInfoLoader+LoadInfo, VeeamDataLoader]][]
            ->  000001cc9dd84068 LoadHelper.VmGuestInfoLoader+LoadInfo
            ->  000001cc9f985978 System.Threading.Tasks.Task`1[[System.Object, mscorlib]]
            ->  000001cc9f985938 System.Func`1[[System.Object, mscorlib]]
            ->  000001cc9f9858e0 HvMonitor.VhdPathFinder+<>c__DisplayClass23_0
            ->  000001cca2edb870 LoadHelper.WmiCache
            ->  000001cca2edb8b8 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]], mscorlib]], mscorlib]]
            ->  000001cca2ee3930 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]], mscorlib]], mscorlib]][]
            ->  000001cca2ee35f8 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]], mscorlib]]
            ->  000001cca2ee36c0 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]], mscorlib]][]
            ->  000001cca75d66d0 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]]
            ->  000001cca77cb910 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Collections.Generic.KeyValuePair`2[[System.String, mscorlib],[System.Collections.Generic.IEnumerable`1[[System.Management.ManagementObject, System.Management]], mscorlib]], mscorlib]][]
            ->  000001cca7782e00 System.Collections.ObjectModel.ReadOnlyCollection`1[[System.Management.ManagementObject, System.Management]]
            ->  000001cca76a1018 System.Collections.Generic.List`1[[System.Management.ManagementObject, System.Management]]
            ->  000001cca7781318 System.Management.ManagementObject[]
            ->  000001cca7780808 System.Management.ManagementObject

Found 1 unique roots (run '!GCRoot -all' to see all roots).
Call stack for this thread e60:

Code: Select all

0:018> !clrstack2
DbgId ThreadId Apartment Kind       CLR          GC Mode    GC Suspending?
   18      e60           Background v4.7.3260.00 Preemptive no

WaitHandle.WaitAny
------------------
WaitHandle Array 0x000001cca2edb830

SP               IP               Function                                                                                                                  Source
0000009dc02feb38 0000000000000000 HelperMethodFrame_1OBJ [System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[], Int32, Boolean, Boolean)] 
0000009dc02fec70 00007ff9a388f26c System.Threading.WaitHandle.WaitAny(System.Threading.WaitHandle[], Int32, Boolean)
0000009dc02fecd0 00007ff945755f8b HvMonitor.InventoryBuilder.Build(Int32)
0000009dc02fed60 00007ff945755c72 HvMonitor.EntityManager.LoadEntities(System.Object) 
0000009dc02fedb0 00007ff945755ab1 DomainBoundILStubClass.IL_STUB_COMtoCLR(System.StubHelpers.NativeVariant, IntPtr) 
0000009dc02fefb0 0000000000000000 ComMethodFrame 
By further looking at the code of the methods HvMonitor.InventoryBuilder.Build and HvMonitor.InventoryBuilder.buildHostInventory, we have noticed:
Build creates a new thread to run buildImpl, which in turn calls into buildHostInventory
• The new thread then iterates through the VMs and performs several tasks
Build waits for the newly created thread and The code contains some situations where it will call Thread.Abort on the newly created thread. For instance this would occur if the wait is above a specific timeout.

This could happen if the process in this environment has a very high number of hosts and VMs to iterate through and corresponding VHD/snapshots and timeout is reached. Reducing hosts/VMs to monitor by the process and/or increasing timeout if that’s possible could both help mitigate the situation, as well as frequency of inventorying build tasks.

Reference for Thread.Abort Method makes the following remark:

Important
The Thread.Abort method should be used with caution. Particularly when you call it to abort a thread other than the current thread, you do not know what code has executed or failed to execute when the ThreadAbortException is thrown, nor can you be certain of the state of your application or any application and user state that it is responsible for preserving. For example, calling Thread.Abort may prevent static constructors from executing or prevent the release of unmanaged resources.
Note that the Thread.Abort method is not supported on .NET Core.

At this point, the investigation points to this being the cause of the problem. Calling Abort method on a thread should be only used with caution as the thread terminates but references to unmanaged resources can be left behind in the process, thus not freeing those resources and increasing memory usage. This would explain the behaviour you’re observing.

We recommend contacting with the vendor of the application to further investigate the situation.
And here is some another troubleshoot info:
NTLM ETL shows 9664 calls to SpAcquireCredentialsHandle while there are only 2 calls to SpFreeCredentialsHandle in a 5 minutes timeframe. That’s only 0,05 % of the handles being closed.

Kerberos ETL during same timeframe shows a much closer number between Acquire and Free, while still differ by high number:
 Credentials Handles
o 19321 calls to SpAcquireCredentialsHandle all by PID 0x870 (2160 -> VeeamDCS.exe)
o 12548 calls to SpFreeCredentialsHandle
 User Mode Context
o 7719 calls to SpInitUserModeContext, 7686 of them by PID 0x870 too
o 4287 calls to SpDeleteUserModeContext
 We can see ISC calls, and deletions
o 15442 SpInitLsaModeContext calls by PID 0x870
o 8574 calls to SpDeleteContext

As a summary, the application is quite intensively acquiring credential handles and initializing security contexts.


Based on the above, everything points to a possible issue with the VeeamDCS application not freeing all the Credential Handles it creates, thus leaking them.
This leads to a situation where both LSASS and LSAISO consume a high memory portion.

The application is quite intensively acquiring credential handles and initializing security contexts. NTLM ETL shows 9664 calls to SpAcquireCredentialsHandle while there are only 2 calls to SpFreeCredentialsHandle in a 5 minutes timeframe. That’s only 0,05 % of the handles being closed.
Hope this info will be enough to find out root cause of this issue and fix it.

We can't open a case because of using Community License.
Nail Mukhametshin | System Engineer | R&D Services Infrastructure Development | Kaspersky Lab

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Apr 08, 2019 10:52 am

Hi Nail,

Thanks for reporting it. We (R&D) will take deeper a look at this information.

By the way, I'd like to mention that we offer best-effort support for customers using Free Veeam products.

Regards,
Fedor

Nail.Mukhametshin
Enthusiast
Posts: 50
Liked: 3 times
Joined: May 05, 2016 1:15 pm
Full Name: Наиль Мухаметшин
Location: Russia, Moscow
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by Nail.Mukhametshin » Apr 08, 2019 11:47 am

Yes, I understand this. But any way we are open to communications and will provide logs by your request if they will be needed for you.
Nail Mukhametshin | System Engineer | R&D Services Infrastructure Development | Kaspersky Lab

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Apr 08, 2019 12:24 pm

Hi Nail,

Thanks. I'll let you know if anything else is required or we have any news in this regard.

Regards,
Fedor

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Apr 10, 2019 12:27 pm

Hi Nail,

Just a quick update: our tech. support engineers have opened a case 03506388 and will reach you for additional information such as logs, etc.

Regards,
Fedor

Nail.Mukhametshin
Enthusiast
Posts: 50
Liked: 3 times
Joined: May 05, 2016 1:15 pm
Full Name: Наиль Мухаметшин
Location: Russia, Moscow
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by Nail.Mukhametshin » Apr 12, 2019 12:40 pm

Thank you a lot. We will work together to solve this issue now.

I will post update when we find something.
Nail Mukhametshin | System Engineer | R&D Services Infrastructure Development | Kaspersky Lab

Nail.Mukhametshin
Enthusiast
Posts: 50
Liked: 3 times
Joined: May 05, 2016 1:15 pm
Full Name: Наиль Мухаметшин
Location: Russia, Moscow
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by Nail.Mukhametshin » Apr 12, 2019 12:46 pm

And just quick update - same behavior we see at Veeam B&R 9.5u4 too, but another team is responsible for this servers. There is a physical servers with a lot of memory, so the affect is not so catastrophic.

If you interested I will tell them to contact support and create tickets. I think that perfectly do this when we will find cause of this issue in Veeam One.
Nail Mukhametshin | System Engineer | R&D Services Infrastructure Development | Kaspersky Lab

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Apr 12, 2019 2:22 pm

Hi Nail,

Thank you for your efforts.

If the situation with that server is absolutely similar we should be fine having just the info about your server you reported. If it's a different issue, related to B&R, for example, then a separate case should be opened.

Regards,
Fedor

dsortpud
Lurker
Posts: 2
Liked: 1 time
Joined: Feb 15, 2017 6:33 pm
Full Name: DS
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by dsortpud » Jun 14, 2019 3:16 pm

Following because we are having what appears to be the same issue.

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Jun 14, 2019 3:18 pm 1 person likes this post

Hi DS,

We are working on it in cooperation with Microsoft since the root cause appears to be on the MS side. Stay tuned - we keep sharing updates here.

Thanks

wishr
Veeam Software
Posts: 1334
Liked: 133 times
Joined: Aug 07, 2018 3:11 pm
Full Name: Fedor Maslov
Contact:

Re: Bug - Memory leak in LSAISO and LSASS

Post by wishr » Aug 07, 2019 11:10 am

Hi guys,

Good news: after a lot of conversations, Microsoft has been able to reproduce it and confirmed it's a bug on their end. No ETA for the fix, but the ice has broken!

Regards,
Fedor

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests