[Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline #1440

Lubhawan · 2024-01-25T20:53:39Z

Bug description

expected the model to run successfully but it is throwing error JIT compliation failed (while running on gpu).

Code snippet to reproduce the bug

model = ocr_predictor(det_arch = 'linknet_resnet18', reco_arch = 'crnn_vgg16_bn', pretrained = True)
img_path = "/home/lubhawan/Downloads/iloveimg-converted/Hospital-Bill-4.jpg" #Specify your image path here
img = DocumentFile.from_images(img_path)
result = model(img)

Error traceback

UnknownError Traceback (most recent call last)
Cell In[5], line 3
1 img_path = "/home/lubhawan/Downloads/iloveimg-converted/Hospital-Bill-4.jpg" #Specify your image path here
2 img = DocumentFile.from_images(img_path)
----> 3 result = model(img)
4 output = result.export()

File ~/.local/lib/python3.11/site-packages/doctr/models/predictor/tensorflow.py:89, in OCRPredictor.call(self, pages, **kwargs)
86 pages = [rotate_image(page, -angle, expand=True) for page, angle in zip(pages, origin_page_orientations)]
88 # Localize text elements
---> 89 loc_preds_dict = self.det_predictor(pages, **kwargs)
90 assert all(
91 len(loc_pred) == 1 for loc_pred in loc_preds_dict
92 ), "Detection Model in ocr_predictor should output only one class"
94 loc_preds: List[np.ndarray] = [list(loc_pred.values())[0] for loc_pred in loc_preds_dict]

File ~/.local/lib/python3.11/site-packages/doctr/models/detection/predictor/tensorflow.py:45, in DetectionPredictor.call(self, pages, **kwargs)
42 if any(page.ndim != 3 for page in pages):
43 raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")
---> 45 processed_batches = self.pre_processor(pages)
46 predicted_batches = [
47 self.model(batch, return_preds=True, training=False, **kwargs)["preds"] for batch in processed_batches
48 ]
49 return [pred for batch in predicted_batches for pred in batch]

File ~/.local/lib/python3.11/site-packages/doctr/models/preprocessor/tensorflow.py:111, in PreProcessor.call(self, x)
107 batches = [x]
109 elif isinstance(x, list) and all(isinstance(sample, (np.ndarray, tf.Tensor)) for sample in x):
110 # Sample transform (to tensor, resize)
--> 111 samples = list(multithread_exec(self.sample_transforms, x))
112 # Batching
113 batches = self.batch_inputs(samples)

File ~/.local/lib/python3.11/site-packages/doctr/utils/multithreading.py:47, in multithread_exec(func, seq, threads)
42 # Multi-threading
43 else:
44 with ThreadPool(threads) as tp:
45 # ThreadPool's map function returns a list, but seq could be of a different type
46 # That's why wrapping result in map to return iterator
---> 47 results = map(lambda x: x, tp.map(func, seq))
48 return results

File ~/anconda3/lib/python3.11/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
362 def map(self, func, iterable, chunksize=None):
363 '''
364 Apply func to each element in iterable, collecting the results
365 in a list that is returned.
366 '''
--> 367 return self._map_async(func, iterable, mapstar, chunksize).get()

File ~/anconda3/lib/python3.11/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value

File ~/anconda3/lib/python3.11/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:

File ~/anconda3/lib/python3.11/multiprocessing/pool.py:48, in mapstar(args)
47 def mapstar(args):
---> 48 return list(map(*args))

File ~/.local/lib/python3.11/site-packages/doctr/models/preprocessor/tensorflow.py:76, in PreProcessor.sample_transforms(self, x)
74 x = tf.image.convert_image_dtype(x, dtype=tf.float32)
75 # Resizing
---> 76 x = self.resize(x)
78 return x

File ~/.local/lib/python3.11/site-packages/doctr/transforms/modules/tensorflow.py:107, in Resize.call(self, img, target)
100 def call(
101 self,
102 img: tf.Tensor,
103 target: Optional[np.ndarray] = None,
104 ) -> Union[tf.Tensor, Tuple[tf.Tensor, np.ndarray]]:
105 input_dtype = img.dtype
--> 107 img = tf.image.resize(img, self.wanted_size, self.method, self.preserve_aspect_ratio)
108 # It will produce an un-padded resized image, with a side shorter than wanted if we preserve aspect ratio
109 raw_shape = img.shape[:2]

File ~/.local/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback..error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.traceback)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb

File ~/.local/lib/python3.11/site-packages/tensorflow/python/framework/ops.py:5883, in raise_from_not_ok_status(e, name)
5881 def raise_from_not_ok_status(e, name) -> NoReturn:
5882 e.message += (" name: " + str(name if name is not None else ""))
-> 5883 raise core._status_to_exception(e) from None

UnknownError: {{function_node _wrapped__Round_device/job:localhost/replica:0/task:0/device:GPU:0}} JIT compilation failed. [Op:Round] name:

Environment

DocTR version: v0.7.0
TensorFlow version: 2.15.0
PyTorch version: 2.1.2+cu121 (torchvision 0.16.2+cu121)
OpenCV version: 4.9.0
OS: Ubuntu 22.04.3 LTS
Python version: 3.11.5
Is CUDA available (TensorFlow): Yes
Is CUDA available (PyTorch): Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070
Nvidia driver version: 535.154.05
cuDNN version: Could not collect

Deep Learning backend

is_tf_available: True
is_torch_available: True

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-01-26T09:14:39Z

Hi @Lubhawan 👋

Thanks for reporting this.
We have faiced this issue already it comes from the transformations (only on CUDA on CPU everything works as expected).

I will update your report a bit

TensorFlow (only on CUDA)
Affected transformations: Resize, Shadow, Blur

felixdittrich92 · 2024-01-26T09:18:53Z

Could you try it again please with disabling multiprocessing ?

DOCTR_MULTIPROCESSING_DISABLE=TRUE

Lubhawan · 2024-01-26T09:44:09Z

Yeah, It is working fine on cpu

felixdittrich92 · 2024-04-25T17:57:23Z

Related to used multiprocessing in dataloader and pipeline

Lubhawan added the type: bug Something isn't working label Jan 25, 2024

felixdittrich92 changed the title ~~JIT compilation failed while running on gpu~~ [Bug] TensorFlow: Resize, Blur, Shadow transformations raises exception only on CUDA Jan 26, 2024

felixdittrich92 added module: transforms Related to doctr.transforms framework: tensorflow Related to TensorFlow backend labels Jan 26, 2024

felixdittrich92 added this to the 0.9.0 milestone Jan 26, 2024

felixdittrich92 modified the milestones: 0.9.0, 1.0.0 Feb 9, 2024

felixdittrich92 changed the title ~~[Bug] TensorFlow: Resize, Blur, Shadow transformations raises exception only on CUDA~~ [Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline #1440

[Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline #1440

Lubhawan commented Jan 25, 2024

felixdittrich92 commented Jan 26, 2024

felixdittrich92 commented Jan 26, 2024

Lubhawan commented Jan 26, 2024

felixdittrich92 commented Apr 25, 2024

[Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline #1440

[Bug] TensorFlow - CUDA: multiprocessing does not work as expected - Dataloader and inference pipeline #1440

Comments

Lubhawan commented Jan 25, 2024

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Jan 26, 2024

felixdittrich92 commented Jan 26, 2024

Lubhawan commented Jan 26, 2024

felixdittrich92 commented Apr 25, 2024