Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.function deadlock with multiple multiprocess/threading #66115

Open
jonas-eschle opened this issue Apr 19, 2024 · 1 comment
Open

tf.function deadlock with multiple multiprocess/threading #66115

jonas-eschle opened this issue Apr 19, 2024 · 1 comment
Assignees
Labels
comp:tf.function tf.function related issues TF 2.16 type:bug Bug

Comments

@jonas-eschle
Copy link
Contributor

jonas-eschle commented Apr 19, 2024

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.16

Custom code

Yes

Current behavior?

TensorFlow gets stuck when using multiprocessing/threading more than once.

I've observed it in more complicated situations with only once multithreading, but the following is a reproducible, standalone example that illustrates the point.

The code works correctly if either the tf.function decorator is removed or if xla compilation jit_compile is enabled (!).

Standalone code to reproduce the issue

import multiprocessing as mp  # can also be another module for multiprocess/threading
import random

import tensorflow as tf


# if we use jit_compile=True, it will work, magically
# it also works if there is no decorator at all, i.e. only eager mode
@tf.function(jit_compile=False)
def testjit(x):
    return tf.math.reduce_sum(x)


def make_zdata(_=None):
    print('Making data')
    # just to make sure that we recompile the function (different shapes)
    rnd = tf.random.uniform([random.randint(100, 10000)], -1, 1)
    zdata = testjit(rnd)
    print('Made data')
    return zdata

with mp.Pool(1) as executor:
    executor.map(make_zdata, [1])
executor.terminate()

# if we run this, it will fail (if jit_compile=False)
with mp.Pool(1) as executor:
    executor.map(make_zdata, [1])  # here, the code will be stuck
executor.terminate()

Relevant log output

will be, approximately:

Making data
Made data
Making data

and then it's stuck. Otherwise, if using jit_compile=True or not using the tf.function decorator at all, another Made data will be printed

@Venkat6871
Copy link

Hi @jonas-eschle ,
Sorry for the delay, I tried to run your code on colab using TF v2.16.1, nightly and faced the same issue. Please find the gist here for reference.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tf.function tf.function related issues TF 2.16 type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants