Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock during classloading of org.xbill.DNS.NioTcpClient #315

Open
ganzm opened this issue Mar 4, 2024 · 1 comment
Open

Deadlock during classloading of org.xbill.DNS.NioTcpClient #315

ganzm opened this issue Mar 4, 2024 · 1 comment

Comments

@ganzm
Copy link

ganzm commented Mar 4, 2024

On version 3.5.3 in class org.xbill.DNS.NioTcpClient there is a possible deadlock situation as seen in the stacktrace below.

This only happens if classloading timing is very unfortunate

  • Assume a fully initialized org.xbill.DNS.Resolver only issuing DNS queries via UDP
  • At some point condition out.length > udpSize evaluates to true in method org.xbill.DNS.SimpleResolver#sendAsync(org.xbill.DNS.Message, boolean, java.util.concurrent.Executor)
    The next DNS request will use invoke NioTcpClient.sendrecv in SimpleResolver
  • The first ever Tcp based request will trigger class loading of org.xbill.DNS.NioTcpClient
  • Static constructor of NioTcpClient expects to be able to acquire a lock on class org.xbill.DNS.NioClient. However, in the case of a deadlock it will not get that lock.
  • If right before that, another thread acquired the lock on class org.xbill.DNS.NioClient it will need to wait on class initialization (which is waiting on the previously mentioned lock)

Stacktrace


"dns-query-0" #83 daemon prio=5 os_prio=0 cpu=3161.04ms elapsed=148678.94s tid=0x00007f4204732310 nid=0xf7 waiting for monitor entry  [0x00007f421dfe9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.xbill.DNS.NioClient.setTimeoutTask(NioClient.java:141)
	- waiting to lock <0x00000000887809b8> (a java.lang.Class for org.xbill.DNS.NioClient)
	at org.xbill.DNS.NioTcpClient.<clinit>(NioTcpClient.java:32)
	at org.xbill.DNS.SimpleResolver.sendAsync(SimpleResolver.java:371)
	at org.xbill.DNS.SimpleResolver.lambda$sendAsync$1(SimpleResolver.java:446)
	at org.xbill.DNS.SimpleResolver$$Lambda$2084/0x00007f4220b80a98.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(java.base@17.0.10/CompletableFuture.java:1150)
	at java.util.concurrent.CompletableFuture$Completion.run(java.base@17.0.10/CompletableFuture.java:482)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.10/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.10/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(java.base@17.0.10/Thread.java:840)

"dnsjava NIO selector" #81 daemon prio=5 os_prio=0 cpu=13969.60ms elapsed=148678.96s tid=0x0000557c0c33cc60 nid=0xf6 in Object.wait()  [0x00007f421e0ef000]
   java.lang.Thread.State: RUNNABLE
	at org.xbill.DNS.NioTcpClient$$Lambda$3208/0x00007f4220d82520.run(Unknown Source)
	- waiting on the Class initialization monitor for org.xbill.DNS.NioTcpClient
	at org.xbill.DNS.NioClient.runTasks(NioClient.java:163)
	- locked <0x00000000887809b8> (a java.lang.Class for org.xbill.DNS.NioClient)
	at org.xbill.DNS.NioClient.runSelector(NioClient.java:128)
	at org.xbill.DNS.NioClient$$Lambda$2081/0x00007f4220b80220.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.10/Thread.java:840)


TLDR

There is a super rare deadlock while class-loading org.xbill.DNS.NioTcpClient. It can be avoided be manually pre-loading said class.

As far as I can see, the following commit on master may prevent that from happening. The offending static constructor has been removed

5da2770

@demonti
Copy link
Contributor

demonti commented May 7, 2024

I can confirm the issue with the 3.5.3 release. Actually, for my test case with ten thousands of queries in total and many hundreds in parallel, the deadlock appears quite often, not that "super rare". I have not yet checked my app against the current master code, but for the 3.5.3 release simply removing the "synchronized" keywords from the NioClient setTimeoutTask, setRegistrationsTask and setCloseTask would IMHO remove the problem as well.

Edit: I tried out the current master and all test runs completed successfully so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants