Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab RuntimeError: DataLoader worker (pid 17693) is killed by signal: Killed. #17

Open
Gad1001 opened this issue Apr 26, 2023 · 3 comments

Comments

@Gad1001
Copy link

Gad1001 commented Apr 26, 2023

Hi i treid to run the colab on custom dataset but i get the following error:

/usr/local/lib/python3.9/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
loading model from ./model_zoo/rvrt/model_zoo/rvrt/001_RVRT_videosr_bi_REDS_30frames.pth
using dataset from testsets/uploaded
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/usr/lib/python3.9/multiprocessing/queues.py", line 113, in get
    if not self._poll(timeout):
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/usr/lib/python3.9/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.9/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 17693) is killed by signal: Killed. 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/content/RVRT/main_test_rvrt.py", line 335, in <module>
    main()
  File "/content/RVRT/main_test_rvrt.py", line 68, in main
    for idx, batch in enumerate(test_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
    idx, data = self._get_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1295, in _get_data
    success, data = self._try_get_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 17693) exited unexpectedly

Any idea?
Thanks :)

@Anshu1882000
Copy link

Did you able to solve this ?? I'm also trying to run the model on my own dataset but for some reason it getting killed in between.

@chhkang
Copy link

chhkang commented Sep 11, 2023

Log the file name, and check whether the data is polluted.

@MelihDarcanxyz
Copy link

Same error, trying to run it on local with custom data. Checked the VRAM usage but it doesn't use much. So it's unlikely to be a VRAM problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants