Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错BrokenPipeError: [Errno 32] Broken pipe,完整报错如下,请问这是哪里的问题 #78

Open
pandazzh2020 opened this issue Jul 22, 2023 · 1 comment

Comments

@pandazzh2020
Copy link

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/huatuo did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/cuda/compat/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 114
CUDA SETUP: Loading binary /opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114.so...
Training Alpaca-LoRA model with params:
base_model: decapoda-research/llama-7b-hf
data_path: ./data/Format_data_sheet_mini.json
output_dir: ./lora-llama-med-e1
batch_size: 1
micro_batch_size: 1
num_epochs: 5
learning_rate: 0.0003
cutoff_len: 256
val_set_size: 500
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: False
group_by_length: False
wandb_project: llama_med
wandb_run_name: e1
wandb_watch:
wandb_log_model:
resume_from_checkpoint: False
prompt template: med_template

The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:22<00:00, 1.44it/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 376.91it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-9f4de28c7bc88c4b.arrow and /root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-b165ac8522c98057.arrow
wandb: W&B API key is configured. Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /zheng_zhong_hua/Huatuo-llama-med-chinese/wandb/run-20230722_105219-wnnb9l4i
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run e1
wandb: ⭐️ View project at https://wandb.ai/chat2023/llama_med
wandb: 🚀 View run at https://wandb.ai/chat2023/llama_med/runs/wnnb9l4i
0%| | 0/15535 [00:00<?, ?it/s]/opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
0%| | 2/15535 [00:03<7:23:53, 1.71s/it]wandb: Network error (TransientError), entering retry loop.
{'loss': 2.2941, 'learning_rate': 9.65250965250965e-07, 'epoch': 0.0}
{'loss': 2.1901, 'learning_rate': 2.5096525096525096e-06, 'epoch': 0.01}
{'loss': 2.6166, 'learning_rate': 4.054054054054054e-06, 'epoch': 0.01}
0%|▏ | 29/15535 [00:29<4:12:00, 1.03it/s]Exception in thread ChkStopThr:
Traceback (most recent call last):
File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 273, in check_stop_status
Exception in thread NetStatThr:
Traceback (most recent call last):
self._loop_check_status(
File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 980, in _bootstrap_inner
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status
self.run()
local_handle = request()
File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 917, in run
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 787, in deliver_stop_status
return self._deliver_stop_status(status)
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 585, in _deliver_stop_status
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 255, in check_network_status
return self._deliver_record(record)
self._loop_check_status(
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status
handle = mailbox._deliver_record(record, interface=self)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
local_handle = request()
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 795, in deliver_network_status
interface._publish(record)
return self._deliver_network_status(status)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 601, in _deliver_network_status
return self._deliver_record(record)
self._sock_client.send_record_publish(record)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
handle = mailbox._deliver_record(record, interface=self)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record
self._send_message(msg)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
interface._publish(record)
self._sendall_with_error_handle(header + data)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
self._sock_client.send_record_publish(record)
sent = self._sock.send(data)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
BrokenPipeError: [Errno 32] Broken pipe
self.send_server_request(server_req)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
0%|▎ | 32/15535 [00:32<4:04:48, 1.06it/s]Traceback (most recent call last):
File "/zheng_zhong_hua/Huatuo-llama-med-chinese/finetune.py", line 289, in
fire.Fire(train)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/zheng_zhong_hua/Huatuo-llama-med-chinese/finetune.py", line 279, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2020, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2307, in _maybe_log_save_evaluate
self.log(logs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2672, in log
self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer_callback.py", line 390, in on_log
return self.call_event("on_log", args, state, control, logs=logs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer_callback.py", line 397, in call_event
result = getattr(callback, event)(
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/integrations.py", line 814, in on_log
self._wandb.log({**logs, "train/global_step": state.global_step})
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 389, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 340, in wrapper_fn
return func(self, *args, **kwargs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 330, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1745, in log
self._log(data=data, step=step, commit=commit)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1526, in _log
self._partial_history_callback(data, step, commit)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1396, in _partial_history_callback
self._backend.interface.publish_partial_history(
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 584, in publish_partial_history
self._publish_partial_history(partial_history)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 89, in _publish_partial_history
self._publish(rec)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
BrokenPipeError: [Errno 32] Broken pipe
wandb: While tearing down the service manager. The following error has occurred: [Errno 32] Broken pipe

@pandazzh2020
Copy link
Author

batch_size: 1 调小后未解决
micro_batch_size: 1
num_epochs: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant