You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我在训练时输出以下内容后,程序就停止了,请问这种情况该如何解决?
`2024-05-15 09:29:44.939294: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-15 09:29:44.939347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-15 09:29:44.940554: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-05-15 09:29:49,373] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-05-15 09:30:12.273661: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-15 09:30:12.273709: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-15 09:30:12.274819: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-05-15 09:30:16,168] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
05/15 09:30:19 - mmengine - INFO -
System environment:
sys.platform: linux
Python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1102040617
GPU 0: B1.gpu.medium
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.2, V12.2.140
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.0a0+32f93b1
PyTorch compiling details: PyTorch built with:
GCC 11.2
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
05/15 09:30:19 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized. low_cpu_mem_usage was None, now set to True since model is quantized.`
The text was updated successfully, but these errors were encountered:
我在训练时输出以下内容后,程序就停止了,请问这种情况该如何解决?
`2024-05-15 09:29:44.939294: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-15 09:29:44.939347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-15 09:29:44.940554: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-05-15 09:29:49,373] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-05-15 09:30:12.273661: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-15 09:30:12.273709: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-15 09:30:12.274819: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2024-05-15 09:30:16,168] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
05/15 09:30:19 - mmengine - INFO -
System environment:
sys.platform: linux
Python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1102040617
GPU 0: B1.gpu.medium
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.2, V12.2.140
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.0a0+32f93b1
PyTorch compiling details: PyTorch built with:
GCC 11.2
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash N/A)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 12.2
NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_87,code=sm_87;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
CuDNN 8.9.5
Magma 2.6.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.2, CUDNN_VERSION=8.9.5, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.16.0a0
OpenCV: 4.7.0
MMEngine: 0.10.4
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 1102040617
deterministic: False
Distributed launcher: none
Distributed training: False
GPU number: 1
05/15 09:30:19 - mmengine - INFO - Config:
SYSTEM = 'xtuner.utils.SYSTEM_TEMPLATE.alpaca'
accumulative_counts = 16
alpaca_en = dict(
dataset=dict(path='./alpaca', type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.alpaca_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.chatglm3',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
encode_special_tokens=True,
padding_side='left',
pretrained_model_name_or_path='/gemini/pretrain',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset',
use_varlen_attn=False)
alpaca_en_path = './alpaca'
batch_size = 1
betas = (
0.9,
0.999,
)
custom_hooks = [
dict(
tokenizer=dict(
encode_special_tokens=True,
padding_side='left',
pretrained_model_name_or_path='/gemini/pretrain',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.hooks.DatasetInfoHook'),
dict(
evaluation_inputs=[
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
],
every_n_iters=500,
prompt_template='xtuner.utils.PROMPT_TEMPLATE.chatglm3',
system='xtuner.utils.SYSTEM_TEMPLATE.alpaca',
tokenizer=dict(
encode_special_tokens=True,
padding_side='left',
pretrained_model_name_or_path='/gemini/pretrain',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.hooks.EvaluateChatHook'),
]
dataloader_num_workers = 0
default_hooks = dict(
checkpoint=dict(
by_epoch=False,
interval=500,
max_keep_ckpts=2,
type='mmengine.hooks.CheckpointHook'),
logger=dict(
interval=10,
log_metric_by_epoch=False,
type='mmengine.hooks.LoggerHook'),
param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_inputs = [
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
lr = 0.0002
max_epochs = 3
max_length = 2048
max_norm = 1
model = dict(
llm=dict(
pretrained_model_name_or_path='/gemini/pretrain',
quantization_config=dict(
bnb_4bit_compute_dtype='torch.float16',
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
llm_int8_has_fp16_weight=False,
llm_int8_threshold=6.0,
load_in_4bit=True,
load_in_8bit=False,
type='transformers.BitsAndBytesConfig'),
torch_dtype='torch.float16',
trust_remote_code=True,
type='transformers.AutoModelForCausalLM.from_pretrained'),
lora=dict(
bias='none',
lora_alpha=16,
lora_dropout=0.1,
r=64,
task_type='CAUSAL_LM',
type='peft.LoraConfig'),
type='xtuner.model.SupervisedFinetune',
use_varlen_attn=False)
optim_type = 'torch.optim.AdamW'
optim_wrapper = dict(
accumulative_counts=16,
clip_grad=dict(error_if_nonfinite=False, max_norm=1),
dtype='float16',
loss_scale='dynamic',
optimizer=dict(
betas=(
0.9,
0.999,
),
lr=0.0002,
type='torch.optim.AdamW',
weight_decay=0),
type='mmengine.optim.AmpOptimWrapper')
pack_to_max_length = True
param_scheduler = [
dict(
begin=0,
by_epoch=True,
convert_to_iter_based=True,
end=0.09,
start_factor=1e-05,
type='mmengine.optim.LinearLR'),
dict(
begin=0.09,
by_epoch=True,
convert_to_iter_based=True,
end=3,
eta_min=0.0,
type='mmengine.optim.CosineAnnealingLR'),
]
pretrained_model_name_or_path = '/gemini/pretrain'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.chatglm3'
randomness = dict(deterministic=False, seed=None)
resume = False
save_steps = 500
save_total_limit = 2
tokenizer = dict(
encode_special_tokens=True,
padding_side='left',
pretrained_model_name_or_path='/gemini/pretrain',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(max_epochs=3, type='xtuner.engine.runner.TrainLoop')
train_dataloader = dict(
batch_size=1,
collate_fn=dict(
type='xtuner.dataset.collate_fns.default_collate_fn',
use_varlen_attn=False),
dataset=dict(
dataset=dict(path='./alpaca', type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.alpaca_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.chatglm3',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
encode_special_tokens=True,
padding_side='left',
pretrained_model_name_or_path='/gemini/pretrain',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset',
use_varlen_attn=False),
num_workers=0,
sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
use_varlen_attn = False
visualizer = None
warmup_ratio = 0.03
weight_decay = 0
work_dir = './work_dirs/chatglm3_6b_base_qlora_alpaca_e3_copy'
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
05/15 09:30:19 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
low_cpu_mem_usage
was None, now set to True since model is quantized.`The text was updated successfully, but these errors were encountered: