Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeadLock new Pattern, Critical Deadlock Discovery Regarding EclipseLink Semaphore #2100

Open
99sono opened this issue Mar 17, 2024 · 3 comments

Comments

@99sono
Copy link

99sono commented Mar 17, 2024

Hi,

On March 15, 2024, we made a significant discovery related to a deadlock pattern. Specifically, we found that the eclipselink.concurrency.manager.object.building.semaphore property in the persistence.xml file is not entirely safe.

The purpose of this property is to mitigate the probability of deadlocks by imposing a strict limit on the number of threads allowed to access the concurrency manager for object building.

However, our recent investigation revealed an unexpected issue: Threads requesting access to the semaphore for object building might already hold cache keys. This scenario contradicts our initial expectation. Ideally, a thread engaged in object building and requesting the semaphore should not possess any read/write locks.

The data from the massive dump indicates that a specific thread, as shown in the stack trace, was denied access to the semaphore because ten other threads were already engaged in object building. Unfortunately, our thread lacking cache key access holds write lock cache keys that other threads require.

We intend to report this new deadlock pattern to Oracle via a service request promptly.

Stack Trace Pattern 01, ten threads doing object building and stuck because the cache keys the want to acquire for reading are already acquired for writting:
[ACTIVE] ExecuteThread: '525' for queue: 'weblogic.kernel.Default (self-tuning)'"     java.lang.Thread.State: RUNNABLE         at java.management@11.0.16/sun.management.ThreadImpl.getThreadInfo1(Native Method)         at java.management@11.0.16/sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:197)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.enrichGenerateThreadDump(ConcurrencyUtil.java:939)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.createInformationThreadDump(ConcurrencyUtil.java:969)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep02(ConcurrencyUtil.java:570)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationStep01(ConcurrencyUtil.java:554)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.dumpConcurrencyManagerInformationIfAppropriate(ConcurrencyUtil.java:477)         at org.eclipse.persistence.internal.helper.ConcurrencyUtil.determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyUtil.java:170)         at org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireReadLock(ConcurrencyManager.java:333)         at org.eclipse.persistence.internal.identitymaps.CacheKey.acquireReadLock(CacheKey.java:284)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1059)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:952

Stack Trace Pattern 02, The thread holding active write cache keys not being allowed the semaphore as already 10 threads are in doing object building.

"[ACTIVE] ExecuteThread: '207' for queue: 'weblogic.kernel.Default (self-tuning)'"     java.lang.Thread.State: TIMED_WAITING         at java.base@11.0.16/jdk.internal.misc.Unsafe.park(Native Method)         at java.base@11.0.16/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)         at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1079)         at java.base@11.0.16/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1369)         at java.base@11.0.16/java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:415)         at org.eclipse.persistence.internal.helper.ConcurrencySemaphore.acquireSemaphoreIfAppropriate(ConcurrencySemaphore.java:108)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:726)         at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:705)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:861)         at org.eclipse.persistence.queries.ReadObjectQuery.registerResultInUnitOfWork(ReadObjectQuery.java:901)         at org.eclipse.persistence.queries.ReadObjectQuery.executeObjectLevelReadQuery(ReadObjectQuery.java:568)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1232)         at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:911)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1191)         at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:447)         at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1279)         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:3004)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1898)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1880)         at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1830)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:1012)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:954)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:830)         at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:696)         at jdk.internal.reflect.GeneratedMethodAccessor471.invoke(Unknown Source)         at java.base@11.0.16/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.base@11.0.16/java.lang.reflect.Method.invoke(Method.java:566)         at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:97)         at weblogic.persistence.TransactionalEntityManagerProxyImpl.invoke(TransactionalEntityManagerProxyImpl.java:164)         at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:86)         at com.sun.proxy.$Proxy603.find(Unknown Source)         at jdk.internal.reflect.GeneratedMethodAccessor465.invoke(Unknown Source)         at java.base@11.0.16/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.base@11.0.16/java.lang.reflect.Method.invoke(Method.java:566)

Thank you for your attention.

Best regards.

@99sono
Copy link
Author

99sono commented Mar 17, 2024

To safeguard confidential information, we will refrain from sharing specific data such as cache keys owned by thread 207 and the read lock cache keys sought by the ten threads engaged in object building. Instead, we will provide Oracle with the comprehensive massive dump report generated by the EclipseLink library.

The description outlined here should serve as a solid foundation for addressing the defect and devising an appropriate solution.

Thank you for your attention.

Best regards.

@99sono
Copy link
Author

99sono commented Mar 18, 2024

One piece of information that is missing here, the dead lock was experience using eclipselink 2.7.6 of weblogic 14 with additional oracle patches. In short, the eclipselink version is very similar to that of the 2.7.9 tag release.

Thanks.

@99sono
Copy link
Author

99sono commented Apr 3, 2024

Additional Insight: We acknowledge that the deadlock pattern in question is exceedingly rare. To date, it has only manifested in one production instance. Typically, other production instances operate with the object-building semaphore limit enabled and have not exhibited this pattern.

Nonetheless, the evidence at hand conclusively demonstrates that semaphores can indeed contribute to deadlock scenarios. This occurs when a thread, denied entry by the semaphore, retains ownership of cache key resources. Despite the infrequency of such occurrences, the existence of concrete evidence cannot be ignored. It reveals that acquiring the object-building semaphore is akin to wielding a double-edged sword: while it offers benefits, it also harbors the potential for deadlocks. Ironically, the semaphore's original intent was to diminish the likelihood of deadlocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant