Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR (nni.runtime.command_channel.websocket.channel/MainThread) Failed to receive command. Retry in 0s #5747

Open
CCJing14 opened this issue Feb 26, 2024 · 0 comments

Comments

@CCJing14
Copy link

Describe the issue:
I get the following error in logger:
[2024-02-26 16:53:12] ERROR (nni.runtime.command_channel.websocket.channel/MainThread) Failed to receive command. Retry in 0s
Traceback (most recent call last):
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 963, in transfer_data
message = await self.read_message()
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 1033, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 1108, in read_data_frame
frame = await self.read_frame(max_size)
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 1165, in read_frame
frame = await Frame.read(
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/framing.py", line 68, in read
data = await reader(2)
File "/miniconda3/envs/yolo/lib/python3.8/asyncio/streams.py", line 723, in readexactly
await self._wait_for_data('readexactly')
File "miniconda3/envs/yolo/lib/python3.8/asyncio/streams.py", line 517, in _wait_for_data
await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "miniconda3/envs/yolo/lib/python3.8/site-packages/nni/runtime/command_channel/websocket/channel.py", line 99, in _receive_command
command = conn.receive()
File "miniconda3/envs/yolo/lib/python3.8/site-packages/nni/runtime/command_channel/websocket/connection.py", line 103, in receive
msg = _wait(self._ws.recv())
File "miniconda3/envs/yolo/lib/python3.8/site-packages/nni/runtime/command_channel/websocket/connection.py", line 121, in _wait
return future.result()
File "miniconda3/envs/yolo/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "miniconda3/envs/yolo/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 568, in recv
await self.ensure_open()
File "miniconda3/envs/yolo/lib/python3.8/site-packages/websockets/legacy/protocol.py", line 948, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: sent 1011 (internal error) keepalive ping timeout; no close frame received

Environment:

  • NNI version: 3.0
  • Training service (local|remote|pai|aml|etc): remote
  • Client OS: mac
  • Server OS (for remote mode only): linux
  • Python version: 3.8
  • PyTorch/TensorFlow version: 1.13
  • Is conda/virtualenv/venv used?: conda

Configuration:

  • Experiment config (remember to remove secrets!):
  • experimentName: sgd_yolov7
    searchSpaceFile: search_space_sgd_yolov7.json

trialGpuNumber: 1
trialConcurrency: 8
max_trial_number: 10000
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: local
useActiveGpu: True

  • Search space:
    {
    "lr": {"_type": "uniform", "_value": [0.0001, 1.0]},
    "batch_size":{"_type":"choice","_value": [8, 16, 32, 64, 128]}
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant