[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

heungson · 2024-05-17T02:43:36Z

Your current environment

aphrodite docker container

Setting 1
GPUs: RTX8000 * 2
model: alpindale/c4ai-command-r-plus-GPTQ
Quantization: gptq

Setting 2
GPUs: A6000 ada * 4
model: CohereForAI/c4ai-command-r-plus
Quantization: load-in-smooth

🐛 Describe the bug

Starting Aphrodite Engine API server...

exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model alpindale/c4ai-command-r-plus-GPTQ --dtype float16 --max-model-len 29000 --tensor-parallel-size 2 --gpu-memory-utilization 0.97 --quantization gptq --enforce-eager true --trust-remote-code
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
WARNING: gptq quantization is not fully optimized yet. The speed can be slower
than non-quantized models.
2024-05-17 02:21:49,653 INFO worker.py:1749 -- Started a local Ray instance.
INFO: Initializing the Aphrodite Engine (v0.5.3) with the following config:
INFO: Model = 'alpindale/c4ai-command-r-plus-GPTQ'
INFO: Speculative Config = None
INFO: DataType = torch.float16
INFO: Model Load Format = auto
INFO: Number of GPUs = 2
INFO: Disable Custom All-Reduce = False
INFO: Quantization Format = gptq
INFO: Context Length = 29000
INFO: Enforce Eager Mode = True
INFO: KV Cache Data Type = auto
INFO: KV Cache Params Path = None
INFO: Device = cuda
INFO: Guided Decoding Backend =
DecodingConfig(guided_decoding_backend='outlines')
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING: The tokenizer's vocabulary size 255029 does not match the model's
vocabulary size 256000.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
INFO: Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO: Using XFormers backend.
(RayWorkerAphrodite pid=1127) INFO: Cannot use FlashAttention backend for Volta and Turing GPUs.
(RayWorkerAphrodite pid=1127) INFO: Using XFormers backend.
INFO: Aphrodite is using nccl==2.20.5
(RayWorkerAphrodite pid=1127) INFO: Aphrodite is using nccl==2.20.5
INFO: generating GPU P2P access cache for in
/app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
INFO: reading GPU P2P access cache from
/app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
(RayWorkerAphrodite pid=1127) INFO: reading GPU P2P access cache from
(RayWorkerAphrodite pid=1127) /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
(RayWorkerAphrodite pid=1127) INFO: Using model weights format ['.safetensors']
INFO: Using model weights format ['.safetensors']
INFO: Model weights loaded. Memory usage: 27.78 GiB x 2 = 55.55 GiB
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in
[rank0]: run_server(args)
[rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
[rank0]: engine = AsyncAphrodite.from_engine_args(engine_args)
[rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args
[rank0]: engine = cls(engine_config.parallel_config.worker_use_ray,
[rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 323, in init
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 142, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 182, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 208, in determine_num_available_blocks
[rank0]: num_blocks = self._run_workers("determine_num_available_blocks", )
[rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 309, in _run_workers
[rank0]: driver_worker_output = getattr(self.driver_worker,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/worker.py", line 144, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 948, in profile_run
[rank0]: self.execute_model(seqs, kv_caches)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 868, in execute_model
[rank0]: hidden_states = model_executable(**execute_model_kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 390, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 349, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 305, in forward
[rank0]: hidden_states, residual = self.input_layernorm(hidden_states, residual)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 82, in forward
[rank0]: hidden_states = layer_norm_func(hidden_states, self.weight,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
[rank0]: return callback(frame, cache_entry, hooks, frame_state, skip=1)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame
[rank0]: result = inner_convert(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
[rank0]: return _compile(
[rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]: return func(*args, **kwds)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
[rank0]: guarded_code = compile_inner(code, one_graph, hooks, transform)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
[rank0]: out_code = transform_code_object(code, transform)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
[rank0]: transformations(instructions, code_options)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform
[rank0]: tracer.run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
[rank0]: super().run()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
[rank0]: and self.step()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
[rank0]: getattr(self, inst.opname)(inst)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
[rank0]: self.output.compile_subgraph(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 971, in compile_subgraph
[rank0]: self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
[rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]: return func(*args, **kwds)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1168, in compile_and_call_fx_graph
[rank0]: compiled_fn = self.call_user_compiler(gm)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1241, in call_user_compiler
[rank0]: raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/dynamo/output_graph.py", line 1222, in call_user_compiler
[rank0]: compiled_fn = compiler_fn(gm, self.example_inputs())
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
[rank0]: compiled_gm = compiler_fn(gm, example_inputs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1729, in call
[rank0]: return compile_fx(model, inputs, config_patches=self.config)
[rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]: return func(*args, **kwds)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
[rank0]: return aot_autograd(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
[rank0]: cg = aot_module_simplified(gm, example_inputs, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
[rank0]: compiled_fn = create_aot_dispatcher_function(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
[rank0]: compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
[rank0]: return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
[rank0]: return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
[rank0]: compiled_fw = compiler(fw_module, updated_flat_args)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
[rank0]: return inner_compile(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
[rank0]: inner_compiled_fn = compiler_fn(gm, example_inputs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 304, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]: return func(*args, **kwds)
[rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner
[rank0]: return func(*args, **kwds)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
[rank0]: compiled_graph = fx_codegen_and_compile(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
[rank0]: compiled_fn = graph.compile_to_fn()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
[rank0]: return self.compile_to_module().call
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1250, in compile_to_module
[rank0]: self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1208, in codegen
[rank0]: self.scheduler.codegen()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/dynamo/utils.py", line 262, in time_wrapper
[rank0]: r = func(args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/scheduler.py", line 2339, in codegen
[rank0]: self.get_backend(device).codegen_nodes(node.get_nodes()) # type: ignore[possibly-undefined]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 63, in codegen_nodes
[rank0]: return self._triton_scheduling.codegen_nodes(nodes)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3255, in codegen_nodes
[rank0]: return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3427, in codegen_node_schedule
[rank0]: kernel_name = self.define_kernel(src_code, node_schedule)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3537, in define_kernel
[rank0]: basename, _, kernel_path = get_path(code_hash(src_code.strip()), "py")
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codecache.py", line 349, in get_path
[rank0]: subdir = os.path.join(cache_dir(), basename[1:3])
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/utils.py", line 739, in cache_dir
[rank0]: sanitized_username = re.sub(r'[\/:?"<>|]', "", getpass.getuser())
[rank0]: File "/usr/lib/python3.10/getpass.py", line 169, in getuser
[rank0]: return pwd.getpwuid(os.getuid())[0]
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: KeyError: 'getpwuid(): uid not found: 1000'

[rank0]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

[rank0]: You can suppress this exception and fall back to eager by setting:
[rank0]: import torch._dynamo
[rank0]: torch._dynamo.config.suppress_errors = True

(RayWorkerAphrodite pid=1127) INFO: Model weights loaded. Memory usage: 27.78 GiB x 2 = 55.55 GiB
(RayWorkerAphrodite pid=1127) ERROR: Error executing method determine_num_available_blocks. This might
(RayWorkerAphrodite pid=1127) cause deadlock in distributed execution.
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is the log generated with gptq version. The same errors are raised when running with non quantized version of the model. gptq version works fine on vllm.

The text was updated successfully, but these errors were encountered:

josephrocca · 2024-05-29T11:28:00Z

Also getting this error for turboderp/command-r-plus-103B-exl2 on 2x4090s on Runpod (EDIT: and also Dracones/c4ai-command-r-v01_exl2_3.0bpw on 1x4090) with latest official Aphrodite Docker image as of writing:

alpindale/aphrodite-engine@sha256:b1e72201654a172e044a13d9346264a8b4e562dba8f3572bd92f013cf5420eb1

CMD_ADDITIONAL_ARGUMENTS="--model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true"
PORT=7860
HF_HUB_ENABLE_HF_TRANSFER=1
NUM_GPUS=2

I wonder if these are related?

But latest official Docker image should have that change:

[v0.5.3] Release Candidate #388

So maybe not related. I tried setting UID environment variable to 0 and 1000, and I tried --user=root as additional Docker run arg, but I get the same error:

Click for full error logs

2024-05-29T11:22:30.471964965Z �[36m(RayWorkerAphrodite pid=2015)�[0m INFO:     Model weights loaded. Memory usage: 21.13 GiB x 2 = 42.27 GiB
2024-05-29T11:22:30.472028452Z �[36m(RayWorkerAphrodite pid=2015)�[0m ERROR:    Error executing method determine_num_available_blocks. This might 
2024-05-29T11:22:30.472039068Z �[36m(RayWorkerAphrodite pid=2015)�[0m cause deadlock in distributed execution.
2024-05-29T11:22:35.202441059Z Starting Aphrodite Engine API server...
2024-05-29T11:22:35.202724339Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --tensor-parallel-size 2 --model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true
2024-05-29T11:22:38.379034547Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
2024-05-29T11:22:38.379082110Z   warnings.warn(
2024-05-29T11:22:38.869674852Z WARNING:  exl2 quantization is not fully optimized yet. The speed can be slower 
2024-05-29T11:22:38.869720110Z than non-quantized models.
2024-05-29T11:22:38.875878939Z INFO:     Using fp8 data type to store kv cache. It reduces the GPU memory 
2024-05-29T11:22:38.875956953Z footprint and boosts the performance. But it may cause slight accuracy drop 
2024-05-29T11:22:38.875963727Z without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda 
2024-05-29T11:22:38.875971340Z version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for 
2024-05-29T11:22:38.875978534Z common inference criteria.
2024-05-29T11:22:40.637997316Z 2024-05-29 11:22:40,637	WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning.
2024-05-29T11:22:40.638057450Z 2024-05-29 11:22:40,637	WARNING utils.py:592 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 27.2 to 27.
2024-05-29T11:22:40.838132574Z 2024-05-29 11:22:40,837	INFO worker.py:1749 -- Started a local Ray instance.
2024-05-29T11:22:41.538540361Z INFO:     Initializing the Aphrodite Engine (v0.5.3) with the following config:
2024-05-29T11:22:41.538574654Z INFO:     Model = 'turboderp/command-r-plus-103B-exl2'
2024-05-29T11:22:41.538581289Z INFO:     Speculative Config = None
2024-05-29T11:22:41.538587854Z INFO:     DataType = torch.float16
2024-05-29T11:22:41.538593651Z INFO:     Model Load Format = auto
2024-05-29T11:22:41.538598889Z INFO:     Number of GPUs = 2
2024-05-29T11:22:41.538605873Z INFO:     Disable Custom All-Reduce = False
2024-05-29T11:22:41.538611530Z INFO:     Quantization Format = exl2
2024-05-29T11:22:41.538618165Z INFO:     Context Length = 4096
2024-05-29T11:22:41.538624451Z INFO:     Enforce Eager Mode = True
2024-05-29T11:22:41.538629689Z INFO:     KV Cache Data Type = fp8
2024-05-29T11:22:41.538635486Z INFO:     KV Cache Params Path = None
2024-05-29T11:22:41.538640655Z INFO:     Device = cuda
2024-05-29T11:22:41.538646312Z INFO:     Guided Decoding Backend = 
2024-05-29T11:22:41.538651550Z DecodingConfig(guided_decoding_backend='outlines')
2024-05-29T11:22:43.606442894Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-05-29T11:22:43.651450354Z WARNING:  The tokenizer's vocabulary size 255029 does not match the model's 
2024-05-29T11:22:43.651473263Z vocabulary size 256000.
2024-05-29T11:22:43.651841192Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
2024-05-29T11:22:43.651858233Z   warnings.warn(
2024-05-29T11:22:48.376555440Z INFO:     Using FlashAttention backend.
2024-05-29T11:22:49.036469322Z �[36m(RayWorkerAphrodite pid=2017)�[0m INFO:     Using FlashAttention backend.
2024-05-29T11:22:49.036528059Z INFO:     Aphrodite is using nccl==2.20.5
2024-05-29T11:22:49.301692524Z �[36m(RayWorkerAphrodite pid=2017)�[0m INFO:     Aphrodite is using nccl==2.20.5
2024-05-29T11:22:49.301739178Z INFO:     NVLink detection failed with message "Not Supported". This is normal 
2024-05-29T11:22:49.301746442Z if your machine has no NVLink equipped
2024-05-29T11:22:49.303372509Z INFO:     reading GPU P2P access cache from 
2024-05-29T11:22:49.303426357Z /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
2024-05-29T11:22:49.305018272Z WARNING:  Custom allreduce is disabled because your platform lacks GPU P2P 
2024-05-29T11:22:49.305036081Z capability or P2P test failed. To silence this warning, specify 
2024-05-29T11:22:49.305054939Z disable_custom_all_reduce=True explicitly.
2024-05-29T11:22:49.788418031Z �[36m(RayWorkerAphrodite pid=2017)�[0m INFO:     NVLink detection failed with message "Not Supported". This is normal 
2024-05-29T11:22:49.788469016Z �[36m(RayWorkerAphrodite pid=2017)�[0m if your machine has no NVLink equipped
2024-05-29T11:22:49.788473835Z �[36m(RayWorkerAphrodite pid=2017)�[0m INFO:     reading GPU P2P access cache from 
2024-05-29T11:22:49.788497582Z �[36m(RayWorkerAphrodite pid=2017)�[0m /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
2024-05-29T11:22:49.788503378Z �[36m(RayWorkerAphrodite pid=2017)�[0m WARNING:  Custom allreduce is disabled because your platform lacks GPU P2P 
2024-05-29T11:22:49.788509106Z �[36m(RayWorkerAphrodite pid=2017)�[0m capability or P2P test failed. To silence this warning, specify 
2024-05-29T11:22:49.788512947Z �[36m(RayWorkerAphrodite pid=2017)�[0m disable_custom_all_reduce=True explicitly.
2024-05-29T11:22:52.014061650Z �[36m(RayWorkerAphrodite pid=2017)�[0m INFO:     Using model weights format ['*.safetensors']
2024-05-29T11:22:52.014114171Z INFO:     Using model weights format ['*.safetensors']
2024-05-29T11:23:03.570937289Z INFO:     Model weights loaded. Memory usage: 21.14 GiB x 2 = 42.27 GiB
2024-05-29T11:23:16.027270275Z [rank0]: Traceback (most recent call last):
2024-05-29T11:23:16.027320771Z [rank0]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-05-29T11:23:16.027329850Z [rank0]:     return _run_code(code, main_globals, None,
2024-05-29T11:23:16.027337114Z [rank0]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2024-05-29T11:23:16.027344238Z [rank0]:     exec(code, run_globals)
2024-05-29T11:23:16.027351292Z [rank0]:   File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in <module>
2024-05-29T11:23:16.027358905Z [rank0]:     run_server(args)
2024-05-29T11:23:16.027365679Z [rank0]:   File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server
2024-05-29T11:23:16.027372314Z [rank0]:     engine = AsyncAphrodite.from_engine_args(engine_args)
2024-05-29T11:23:16.027379508Z [rank0]:   File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args
2024-05-29T11:23:16.027386562Z [rank0]:     engine = cls(engine_config.parallel_config.worker_use_ray,
2024-05-29T11:23:16.027393826Z [rank0]:   File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 323, in __init__
2024-05-29T11:23:16.027400950Z [rank0]:     self.engine = self._init_engine(*args, **kwargs)
2024-05-29T11:23:16.027408074Z [rank0]:   File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine
2024-05-29T11:23:16.027417083Z [rank0]:     return engine_class(*args, **kwargs)
2024-05-29T11:23:16.027424277Z [rank0]:   File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 142, in __init__
2024-05-29T11:23:16.027431541Z [rank0]:     self._initialize_kv_caches()
2024-05-29T11:23:16.027438595Z [rank0]:   File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 182, in _initialize_kv_caches
2024-05-29T11:23:16.027445230Z [rank0]:     self.model_executor.determine_num_available_blocks())
2024-05-29T11:23:16.027452423Z [rank0]:   File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 208, in determine_num_available_blocks
2024-05-29T11:23:16.027459687Z [rank0]:     num_blocks = self._run_workers("determine_num_available_blocks", )
2024-05-29T11:23:16.027464925Z [rank0]:   File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 309, in _run_workers
2024-05-29T11:23:16.027471909Z [rank0]:     driver_worker_output = getattr(self.driver_worker,
2024-05-29T11:23:16.027479033Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-29T11:23:16.027486297Z [rank0]:     return func(*args, **kwargs)
2024-05-29T11:23:16.027494957Z [rank0]:   File "/app/aphrodite-engine/aphrodite/task_handler/worker.py", line 144, in determine_num_available_blocks
2024-05-29T11:23:16.027502011Z [rank0]:     self.model_runner.profile_run()
2024-05-29T11:23:16.027509066Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-29T11:23:16.027516120Z [rank0]:     return func(*args, **kwargs)
2024-05-29T11:23:16.027522894Z [rank0]:   File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 948, in profile_run
2024-05-29T11:23:16.027530018Z [rank0]:     self.execute_model(seqs, kv_caches)
2024-05-29T11:23:16.027546431Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-29T11:23:16.027567035Z [rank0]:     return func(*args, **kwargs)
2024-05-29T11:23:16.027573600Z [rank0]:   File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 868, in execute_model
2024-05-29T11:23:16.027580305Z [rank0]:     hidden_states = model_executable(**execute_model_kwargs)
2024-05-29T11:23:16.027587568Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-05-29T11:23:16.027594692Z [rank0]:     return self._call_impl(*args, **kwargs)
2024-05-29T11:23:16.027601816Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-05-29T11:23:16.027608870Z [rank0]:     return forward_call(*args, **kwargs)
2024-05-29T11:23:16.027616134Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-05-29T11:23:16.027622839Z [rank0]:     return func(*args, **kwargs)
2024-05-29T11:23:16.027632337Z [rank0]:   File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 390, in forward
2024-05-29T11:23:16.027639391Z [rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
2024-05-29T11:23:16.027646096Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-05-29T11:23:16.027653360Z [rank0]:     return self._call_impl(*args, **kwargs)
2024-05-29T11:23:16.027660414Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-05-29T11:23:16.027668515Z [rank0]:     return forward_call(*args, **kwargs)
2024-05-29T11:23:16.027675639Z [rank0]:   File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 349, in forward
2024-05-29T11:23:16.027682484Z [rank0]:     hidden_states, residual = layer(
2024-05-29T11:23:16.027689608Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-05-29T11:23:16.027696662Z [rank0]:     return self._call_impl(*args, **kwargs)
2024-05-29T11:23:16.027703367Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-05-29T11:23:16.027710491Z [rank0]:     return forward_call(*args, **kwargs)
2024-05-29T11:23:16.027717265Z [rank0]:   File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 305, in forward
2024-05-29T11:23:16.027724389Z [rank0]:     hidden_states, residual = self.input_layernorm(hidden_states, residual)
2024-05-29T11:23:16.027731443Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
2024-05-29T11:23:16.027738637Z [rank0]:     return self._call_impl(*args, **kwargs)
2024-05-29T11:23:16.027745412Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
2024-05-29T11:23:16.027753094Z [rank0]:     return forward_call(*args, **kwargs)
2024-05-29T11:23:16.027759590Z [rank0]:   File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 82, in forward
2024-05-29T11:23:16.027766644Z [rank0]:     hidden_states = layer_norm_func(hidden_states, self.weight,
2024-05-29T11:23:16.027773768Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
2024-05-29T11:23:16.027780473Z [rank0]:     return fn(*args, **kwargs)
2024-05-29T11:23:16.027787596Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
2024-05-29T11:23:16.027794581Z [rank0]:     return callback(frame, cache_entry, hooks, frame_state, skip=1)
2024-05-29T11:23:16.027801705Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame
2024-05-29T11:23:16.027808409Z [rank0]:     result = inner_convert(
2024-05-29T11:23:16.027815603Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
2024-05-29T11:23:16.027825102Z [rank0]:     return _compile(
2024-05-29T11:23:16.027831248Z [rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-05-29T11:23:16.027838372Z [rank0]:     return func(*args, **kwds)
2024-05-29T11:23:16.027844727Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
2024-05-29T11:23:16.027851851Z [rank0]:     guarded_code = compile_inner(code, one_graph, hooks, transform)
2024-05-29T11:23:16.027858486Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.027865540Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.027872315Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
2024-05-29T11:23:16.027879509Z [rank0]:     out_code = transform_code_object(code, transform)
2024-05-29T11:23:16.027886633Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
2024-05-29T11:23:16.027893687Z [rank0]:     transformations(instructions, code_options)
2024-05-29T11:23:16.027900881Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
2024-05-29T11:23:16.027907655Z [rank0]:     return fn(*args, **kwargs)
2024-05-29T11:23:16.027914779Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform
2024-05-29T11:23:16.027921903Z [rank0]:     tracer.run()
2024-05-29T11:23:16.027928957Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
2024-05-29T11:23:16.027935732Z [rank0]:     super().run()
2024-05-29T11:23:16.027942995Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
2024-05-29T11:23:16.027949980Z [rank0]:     and self.step()
2024-05-29T11:23:16.027957173Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
2024-05-29T11:23:16.027963808Z [rank0]:     getattr(self, inst.opname)(inst)
2024-05-29T11:23:16.027968697Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
2024-05-29T11:23:16.027975402Z [rank0]:     self.output.compile_subgraph(
2024-05-29T11:23:16.027982456Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 971, in compile_subgraph
2024-05-29T11:23:16.027988602Z [rank0]:     self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
2024-05-29T11:23:16.027994818Z [rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-05-29T11:23:16.028001593Z [rank0]:     return func(*args, **kwds)
2024-05-29T11:23:16.028008298Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1168, in compile_and_call_fx_graph
2024-05-29T11:23:16.028015771Z [rank0]:     compiled_fn = self.call_user_compiler(gm)
2024-05-29T11:23:16.028022895Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028029949Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028037143Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1241, in call_user_compiler
2024-05-29T11:23:16.028046083Z [rank0]:     raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
2024-05-29T11:23:16.028052648Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1222, in call_user_compiler
2024-05-29T11:23:16.028061657Z [rank0]:     compiled_fn = compiler_fn(gm, self.example_inputs())
2024-05-29T11:23:16.028068921Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
2024-05-29T11:23:16.028075626Z [rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
2024-05-29T11:23:16.028082750Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1729, in __call__
2024-05-29T11:23:16.028089315Z [rank0]:     return compile_fx(model_, inputs_, config_patches=self.config)
2024-05-29T11:23:16.028096020Z [rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-05-29T11:23:16.028101887Z [rank0]:     return func(*args, **kwds)
2024-05-29T11:23:16.028108941Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
2024-05-29T11:23:16.028115645Z [rank0]:     return aot_autograd(
2024-05-29T11:23:16.028122211Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
2024-05-29T11:23:16.028128985Z [rank0]:     cg = aot_module_simplified(gm, example_inputs, **kwargs)
2024-05-29T11:23:16.028136179Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
2024-05-29T11:23:16.028143303Z [rank0]:     compiled_fn = create_aot_dispatcher_function(
2024-05-29T11:23:16.028150357Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028157062Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028163837Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
2024-05-29T11:23:16.028170961Z [rank0]:     compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
2024-05-29T11:23:16.028178084Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
2024-05-29T11:23:16.028185208Z [rank0]:     return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
2024-05-29T11:23:16.028191913Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
2024-05-29T11:23:16.028200085Z [rank0]:     return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
2024-05-29T11:23:16.028206650Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
2024-05-29T11:23:16.028213355Z [rank0]:     compiled_fw = compiler(fw_module, updated_flat_args)
2024-05-29T11:23:16.028220479Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028227812Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028234377Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
2024-05-29T11:23:16.028241920Z [rank0]:     return inner_compile(
2024-05-29T11:23:16.028248625Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
2024-05-29T11:23:16.028255400Z [rank0]:     inner_compiled_fn = compiler_fn(gm, example_inputs)
2024-05-29T11:23:16.028262594Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 304, in inner
2024-05-29T11:23:16.028269648Z [rank0]:     return fn(*args, **kwargs)
2024-05-29T11:23:16.028276772Z [rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-05-29T11:23:16.028283965Z [rank0]:     return func(*args, **kwds)
2024-05-29T11:23:16.028291229Z [rank0]:   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
2024-05-29T11:23:16.028297375Z [rank0]:     return func(*args, **kwds)
2024-05-29T11:23:16.028304359Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028311483Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028318607Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
2024-05-29T11:23:16.028325312Z [rank0]:     compiled_graph = fx_codegen_and_compile(
2024-05-29T11:23:16.028332575Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
2024-05-29T11:23:16.028339630Z [rank0]:     compiled_fn = graph.compile_to_fn()
2024-05-29T11:23:16.028348639Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
2024-05-29T11:23:16.028355833Z [rank0]:     return self.compile_to_module().call
2024-05-29T11:23:16.028363027Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028370151Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028376786Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1250, in compile_to_module
2024-05-29T11:23:16.028383979Z [rank0]:     self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
2024-05-29T11:23:16.028391173Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1208, in codegen
2024-05-29T11:23:16.028398297Z [rank0]:     self.scheduler.codegen()
2024-05-29T11:23:16.028404932Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
2024-05-29T11:23:16.028412056Z [rank0]:     r = func(*args, **kwargs)
2024-05-29T11:23:16.028419320Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/scheduler.py", line 2339, in codegen
2024-05-29T11:23:16.028425955Z [rank0]:     self.get_backend(device).codegen_nodes(node.get_nodes())  # type: ignore[possibly-undefined]
2024-05-29T11:23:16.028433078Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 63, in codegen_nodes
2024-05-29T11:23:16.028440202Z [rank0]:     return self._triton_scheduling.codegen_nodes(nodes)
2024-05-29T11:23:16.028446907Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3255, in codegen_nodes
2024-05-29T11:23:16.028454171Z [rank0]:     return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
2024-05-29T11:23:16.028460736Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3427, in codegen_node_schedule
2024-05-29T11:23:16.028467930Z [rank0]:     kernel_name = self.define_kernel(src_code, node_schedule)
2024-05-29T11:23:16.028474984Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3537, in define_kernel
2024-05-29T11:23:16.028481758Z [rank0]:     basename, _, kernel_path = get_path(code_hash(src_code.strip()), "py")
2024-05-29T11:23:16.028488952Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codecache.py", line 349, in get_path
2024-05-29T11:23:16.028495029Z [rank0]:     subdir = os.path.join(cache_dir(), basename[1:3])
2024-05-29T11:23:16.028501733Z [rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/utils.py", line 739, in cache_dir
2024-05-29T11:23:16.028508438Z [rank0]:     sanitized_username = re.sub(r'[\\/:*?"<>|]', "_", getpass.getuser())
2024-05-29T11:23:16.028518076Z [rank0]:   File "/usr/lib/python3.10/getpass.py", line 169, in getuser
2024-05-29T11:23:16.028524711Z [rank0]:     return pwd.getpwuid(os.getuid())[0]
2024-05-29T11:23:16.028530299Z [rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-29T11:23:16.028537004Z [rank0]: KeyError: 'getpwuid(): uid not found: 1000'
2024-05-29T11:23:16.028544267Z 
2024-05-29T11:23:16.028550483Z [rank0]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-05-29T11:23:16.028557537Z 
2024-05-29T11:23:16.028564103Z 
2024-05-29T11:23:16.028571226Z [rank0]: You can suppress this exception and fall back to eager by setting:
2024-05-29T11:23:16.028577931Z [rank0]:     import torch._dynamo
2024-05-29T11:23:16.028585055Z [rank0]:     torch._dynamo.config.suppress_errors = True

josephrocca · 2024-05-30T13:40:28Z

@AlpinDale Please ignore if this issue is a wontfix (and please forgive this ping in that case 🙏) -- just in case this slipped through the cracks: I can reproduce OP's issue. See my above comment for reproduction details + logs. The TL;DR is that command-r-plus doesn't seem to work with a basic Aphrodite setup (e.g. exl2 weights, Runpod w/ official docker image, as above).

Edit: I can also reproduce with Dracones/c4ai-command-r-v01_exl2_3.0bpw (i.e. issue seems to occur with both command-r and command-r-plus)

AlpinDale · 2024-05-30T13:42:41Z

I'll get to investigating this soon; I've been busy with other projects so I haven't had much time to work on aphrodite lately. I have an inkling that this is related to torch.compile().

heungson added the bug Something isn't working label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

heungson commented May 17, 2024 •

edited

josephrocca commented May 29, 2024 •

edited

josephrocca commented May 30, 2024 •

edited

AlpinDale commented May 30, 2024

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

Comments

heungson commented May 17, 2024 • edited

Your current environment

🐛 Describe the bug

josephrocca commented May 29, 2024 • edited

josephrocca commented May 30, 2024 • edited

AlpinDale commented May 30, 2024

heungson commented May 17, 2024 •

edited

josephrocca commented May 29, 2024 •

edited

josephrocca commented May 30, 2024 •

edited