You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which version of Duende IdentityServer are you using?
6.2.3 Which version of .NET are you using?
.NET 7.0
Describe the bug
A seemingly “hanging” database query takes down our IdentityServer instance when all subsequent requests for Resources result in Failed to obtain cache lock for: 'Duende.IdentityServer.Services.DefaultCache1[Duende.IdentityServer.Models.Resources]'`
To Reproduce
We don't know yet. It occurred twice in our dev environments during automated testing, two months apart. We appear to have had just one IdentityServer node running during the error.
Expected behavior / Question
We don’t want our IDP to be down if this occurs in production. Are there any suspected risks to disabling the locking logic for reads and writes to the cache in DefaultCache<T>-like implementations of ICache<T>?
…cached items are updated very infrequently, so locking while they are updated seems like a sensible defensive strategy at face value…
By “defensive strategy”, are you just referring to how you don’t know how thread-safe the customer’s IMemoryCache implementation is, or how expensive the Func<Task<T>> get parameter is?
We’re using stores from Duende.IdentityServer.EntityFramework, and Microsoft’s default IMemoryCache implementation. We’re planning on bypassing the locking logic if the error occurs in the future, buying us time to debug. We have limited time to test our solution, I'm asking here in hopes of discovering potential issues before we run into them.
Log output/exception with stacktrace
Failed to obtain cache lock for: 'Duende.IdentityServer.Services.DefaultCache`1[Duende.IdentityServer.Models.Resources]'
at Duende.IdentityServer.Services.DefaultCache`1.GetOrAddAsync(String key, TimeSpan duration, Func`1 get) in /_/src/IdentityServer/Services/Default/DefaultCache.cs:line 149
at Duende.IdentityServer.Stores.CachingResourceStore`1.FindItemsAsync[TItem](IEnumerable`1 names, ICache`1 cache, Func`2 getResourcesFunc, Func`2 getFromResourcesFunc, Func`2 getNameFunc, String allCachePrefix) in /_/src/IdentityServer/Stores/Caching/CachingResourceStore.cs:line 255
at Duende.IdentityServer.Stores.CachingResourceStore`1.FindIdentityResourcesByScopeNameAsync(IEnumerable`1 scopeNames) in /_/src/IdentityServer/Stores/Caching/CachingResourceStore.cs:line 187
at Duende.IdentityServer.Stores.IResourceStoreExtensions.FindResourcesByScopeAsync(IResourceStore store, IEnumerable`1 scopeNames) in /_/src/IdentityServer/Extensions/IResourceStoreExtensions.cs:line 38
at Duende.IdentityServer.Stores.IResourceStoreExtensions.FindEnabledResourcesByScopeAsync(IResourceStore store, IEnumerable`1
...
at Duende.IdentityServer.Endpoints.AuthorizeEndpointBase.ProcessAuthorizeRequestAsync(NameValueCollection parameters, ClaimsPrincipal user, Boolean checkConsentResponse) in /_/src/IdentityServer/Endpoints/AuthorizeEndpointBase.cs:line 153
...
Additional context
Some evidence that the lock is being held onto for hours:
The last “Executing… SELECT * From IdentityResources…” is logged at 9:04:14. We found no evidence of this query ever returning or failing. Our first “Failed to obtain cache lock” is logged at 9:05:18.
Our PostgreSQL database that stores our configuration remained up. Token and session cleanup continued to execute fine all morning. The IdentityResources table was available for queries from my local machine.
*Side note, would you have expected CancellationTokenProvider.CancellationToken to rescue EntityFramework.Storage.ResourceStore’s GetAllResourcesAsync() ? We're still looking into our database timeout settings, perhaps the responsibility lies there.
The text was updated successfully, but these errors were encountered:
The intention in the design of the lock is to prevent duplicate calls to the get parameter.
If you remove the locking around that call, the risk is that if the problem occurs again, you'll now have multiple concurrent attempts to perform the get, which might exacerbate the problem. On the other hand, customizing the lock to be tuned for your environment might give you the level of control you need to address this problem.
The lock does have a configurable time out. Perhaps shortening that time out will help. You can also change the database connection that the get function ultimately relies on to change its timeout (and perhaps retry behavior?)
Which version of Duende IdentityServer are you using?
6.2.3
Which version of .NET are you using?
.NET 7.0
Describe the bug
A seemingly “hanging” database query takes down our IdentityServer instance when all subsequent requests for
Resources
result inFailed to obtain cache lock for: 'Duende.IdentityServer.Services.DefaultCache
1[Duende.IdentityServer.Models.Resources]'`To Reproduce
We don't know yet. It occurred twice in our dev environments during automated testing, two months apart. We appear to have had just one IdentityServer node running during the error.
Expected behavior / Question
We don’t want our IDP to be down if this occurs in production. Are there any suspected risks to disabling the locking logic for reads and writes to the cache in
DefaultCache<T>
-like implementations ofICache<T>
?In #659, Joe DeCock mentions:
By “defensive strategy”, are you just referring to how you don’t know how thread-safe the customer’s
IMemoryCache
implementation is, or how expensive theFunc<Task<T>> get
parameter is?We’re using stores from
Duende.IdentityServer.EntityFramework
, and Microsoft’s defaultIMemoryCache
implementation. We’re planning on bypassing the locking logic if the error occurs in the future, buying us time to debug. We have limited time to test our solution, I'm asking here in hopes of discovering potential issues before we run into them.Log output/exception with stacktrace
Additional context
Some evidence that the lock is being held onto for hours:
The last “Executing… SELECT * From IdentityResources…” is logged at 9:04:14. We found no evidence of this query ever returning or failing. Our first “Failed to obtain cache lock” is logged at 9:05:18.
Our PostgreSQL database that stores our configuration remained up. Token and session cleanup continued to execute fine all morning. The IdentityResources table was available for queries from my local machine.
Other related posts:
Consider relaxing the cache lock behavior
Failed to obtain cache lock for Exception
*Side note, would you have expected
CancellationTokenProvider.CancellationToken
to rescue EntityFramework.Storage.ResourceStore’sGetAllResourcesAsync()
? We're still looking into our database timeout settings, perhaps the responsibility lies there.The text was updated successfully, but these errors were encountered: