Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run error about "InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12" #290

Open
ccutyear opened this issue Aug 18, 2022 · 2 comments

Comments

@ccutyear
Copy link

ccutyear commented Aug 18, 2022

this is my run bash,new_gpu_squad_bash.sh(Modified from:gpu_squad_base.sh)
`

local path

SQUAD_DIR=../SQUAD
INIT_CKPT_DIR=../xlnet_cased_L-12_H-768_A-12
PROC_DATA_DIR=proc_data/squad
MODEL_DIR=experiment/squad_new_gpu

Use 3 GPUs, each with 8 seqlen-512 samples

python ../run_squad.py
--use_tpu=False
--num_hosts=1
--num_core_per_host=1
--model_config_path=${INIT_CKPT_DIR}/xlnet_config.json
--spiece_model_file=${INIT_CKPT_DIR}/spiece.model
--output_dir=${PROC_DATA_DIR}
--init_checkpoint=${INIT_CKPT_DIR}/xlnet_model.ckpt
--model_dir=${MODEL_DIR}
--train_file=${SQUAD_DIR}/small_train-v2.0.json
--predict_file=${SQUAD_DIR}/dev-v2.0.json
--uncased=False
--max_seq_length=512
--do_train=True
--train_batch_size=1
--do_predict=True
--predict_batch_size=1
--learning_rate=2e-5
--adam_epsilon=1e-6
--iterations=1000
--save_steps=1000
--train_steps=12000
--warmup_steps=1000
$@
bash run command :CUDA_VISIBLE_DEVICES=0 bash new_gpu_squad_bash.sh GPU space should enough. ![2022-08-18 15-03-03屏幕截图](https://user-images.githubusercontent.com/59367257/185326924-a7572b68-c36f-4096-aaa6-3f2be35fbb26.png) However, the displayed program reports an error.(tensorflow_gpu_1_13) zaisen_ye@ubuntu-DeepLearning-2602056:/data/zaisen_ye/xlnet-master/scripts$ CUDA_VISIBLE_DEVICES=0 bash new_gpu_squad_bash.sh
2022-08-18 15:00:46.353987: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-08-18 15:00:47.049036: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557600f5b2d0 executing computations on platform CUDA. Devices:
2022-08-18 15:00:47.049082: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA GeForce RTX 3080 Ti, Compute Capability 8.6
2022-08-18 15:00:47.051712: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2022-08-18 15:00:47.054733: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557600fd8c20 executing computations on platform Host. Devices:
2022-08-18 15:00:47.054752: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2022-08-18 15:00:47.054866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 Ti major: 8 minor: 6 memoryClockRate(GHz): 1.665
pciBusID: 0000:4f:00.0
totalMemory: 11.77GiB freeMemory: 11.53GiB
2022-08-18 15:00:47.054879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-08-18 15:00:47.057695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-18 15:00:47.057725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-08-18 15:00:47.057734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-08-18 15:00:47.057839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11215 MB memory) -> physical GPU (device:
0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:4f:00.0, compute capability: 8.6)
INFO:tensorflow:Single device mode.

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa1087c9
210>, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=1, num_cores_per_replica=None, per_host_input_for_tra
ining=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_tf_random_seed': None, '_device_fn': None, '_cluster': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step
_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_session_config': allow_soft_placement: true
, '_global_id_in_cluster': 0, '_is_chief': True, '_protocol': None, '_save_checkpoints_steps': 1000, '_experimental_distribute': None, '_save_summary_steps': 100, '_model_dir': 'experiment/squad_new_gpu'
, '_master': ''}
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7f9fa03be3d0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Input tfrecord file glob proc_data/squad/spiece.model..slen-512.qlen-64.train.tf_record
INFO:tensorflow:Find 1 input paths ['proc_data/squad/spiece.model.0.slen-512.qlen-64.train.tf_record']
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops)
is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From ../run_squad.py:1019: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.experimental.map_and_batch(...).
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
WARNING:tensorflow:From /data/zaisen_ye/xlnet-master/modeling.py:534: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with ke
ep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING:tensorflow:From /data/zaisen_ye/xlnet-master/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
INFO:tensorflow:#params: 119082242
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is
deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated an
d will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0422 to layer-0 grad of model/transformer/layer_0/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0563 to layer-1 grad of model/transformer/layer_1/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.0751 to layer-2 grad of model/transformer/layer_2/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1001 to layer-3 grad of model/transformer/layer_3/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1335 to layer-4 grad of model/transformer/layer_4/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.1780 to layer-5 grad of model/transformer/layer_5/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.2373 to layer-6 grad of model/transformer/layer_6/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.3164 to layer-7 grad of model/transformer/layer_7/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.4219 to layer-8 grad of model/transformer/layer_8/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.5625 to layer-9 grad of model/transformer/layer_9/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 0.7500 to layer-10 grad of model/transformer/layer_10/ff/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/q/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/k/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/v/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/r/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/o/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/LayerNorm/beta:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/rel_attn/LayerNorm/gamma:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_1/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_1/bias:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_2/kernel:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/layer_2/bias:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/LayerNorm/beta:0
INFO:tensorflow:Apply mult 1.0000 to layer-11 grad of model/transformer/layer_11/ff/LayerNorm/gamma:0
INFO:tensorflow:Initialize from the ckpt ../xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt
INFO:tensorflow:
*** Global Variables ****
INFO:tensorflow: name = model/transformer/r_w_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/r_r_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/r_s_bias:0, shape = (12, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/seg_embed:0, shape = (12, 2, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (768, 12, 64), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (768, 3072), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (3072,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (3072, 768), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = start_logits/dense/kernel:0, shape = (768, 1)
INFO:tensorflow: name = start_logits/dense/bias:0, shape = (1,)
INFO:tensorflow: name = end_logits/dense_0/kernel:0, shape = (1536, 768)
INFO:tensorflow: name = end_logits/dense_0/bias:0, shape = (768,)
INFO:tensorflow: name = end_logits/LayerNorm/beta:0, shape = (768,)
INFO:tensorflow: name = end_logits/LayerNorm/gamma:0, shape = (768,)
INFO:tensorflow: name = end_logits/dense_1/kernel:0, shape = (768, 1)
INFO:tensorflow: name = end_logits/dense_1/bias:0, shape = (1,)
INFO:tensorflow: name = answer_class/dense_0/kernel:0, shape = (1536, 768)
INFO:tensorflow: name = answer_class/dense_0/bias:0, shape = (768,)
INFO:tensorflow: name = answer_class/dense_1/kernel:0, shape = (768, 1)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2022-08-18 15:01:06.954873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-08-18 15:01:06.954938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-18 15:01:06.954945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-08-18 15:01:06.954951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-08-18 15:01:06.955044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11215 MB memory) -> physical GPU (device:
0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:4f:00.0, compute capability: 8.6)
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint
_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from experiment/squad_new_gpu/model.ckpt-0
WARNING:tensorflow:From /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkp
oint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into experiment/squad_new_gpu/model.ckpt.
2022-08-18 15:05:02.150154: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2022-08-18 15:05:17.835563: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED
2022-08-18 15:05:17.835625: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details
2022-08-18 15:05:17.835644: I tensorflow/stream_executor/stream.cc:5027] [stream=0x557603b483d0,impl=0x557603b391d0] did not memcpy host-to-device; source: 0x7f9cb801b970
2022-08-18 15:05:17.835677: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed to copy memory from host to device in CUDABlas::DoBlasGemmBatched
Traceback (most recent call last):
File "../run_squad.py", line 1317, in
tf.app.run()
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "../run_squad.py", line 1216, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1407, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1171, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12
[[node model/transformer/layer_0/rel_attn/einsum_4/MatMul (defined at /data/zaisen_ye/xlnet-master/modeling.py:133) ]]
[[node add_1 (defined at ../run_squad.py:1088) ]]
Caused by op u'model/transformer/layer_0/rel_attn/einsum_4/MatMul', defined at:
File "../run_squad.py", line 1317, in
tf.app.run()
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "../run_squad.py", line 1216, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "../run_squad.py", line 1033, in model_fn
outputs = function_builder.get_qa_outputs(FLAGS, features, is_training)
File "/data/zaisen_ye/xlnet-master/function_builder.py", line 230, in get_qa_outputs
input_mask=inp_mask)
File "/data/zaisen_ye/xlnet-master/xlnet.py", line 222, in init
) = modeling.transformer_xl(**tfm_args)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 628, in transformer_xl
reuse=reuse)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 309, in rel_multihead_attn
r_r_bias, r_s_bias, attn_mask, dropatt, is_training, scale)
File "/data/zaisen_ye/xlnet-master/modeling.py", line 133, in rel_attn_core
ac = tf.einsum('ibnd,jbnd->ijbn', q_head + r_w_bias, k_head_h)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/special_math_ops.py", line 262, in einsum
axes_to_sum)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/special_math_ops.py", line 394, in _einsum_reduction
product = math_ops.matmul(t0, t1)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 2417, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1423, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12
[[node model/transformer/layer_0/rel_attn/einsum_4/MatMul (defined at /data/zaisen_ye/xlnet-master/modeling.py:133) ]]
[[node add_1 (defined at ../run_squad.py:1088) ]]
`
The packages used in the program are as follows:(tensorflow-1.13.1,sentencepiece-0.1.91,cudatoolkit-10.0.130,cudnn-7.3.1)
(tensorflow_gpu_1_13) zaisen_ye@ubuntu-DeepLearning-2602056:/data/zaisen_ye/xlnet-master/scripts$ conda list
packages in environment at /home/zaisen_ye/.conda/envs/tensorflow_gpu_1_13:
Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
_tflow_select 2.1.0 gpu
absl-py 0.15.0 pyhd3eb1b0_0
astor 0.8.1 pypi_0 pypi
backports 1.1 pyhd3eb1b0_0
backports-weakref 1.0.post1 pypi_0 pypi
backports.weakref 1.0.post1 py_1
blas 1.0 mkl
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2020.6.20 pyhd3eb1b0_3
cudatoolkit 10.0.130 0
cudnn 7.3.1 cuda10.0_0
cupti 10.0.130 0
enum34 1.1.10 pypi_0 pypi
funcsigs 1.0.2 pypi_0 pypi
futures 3.3.0 py27_0
gast 0.5.3 pyhd3eb1b0_0
grpcio 1.41.1 pypi_0 pypi
h5py 2.10.0 pypi_0 pypi
hdf5 1.10.4 hb1b8bf9_0
intel-openmp 2022.0.1 h06a4308_3633
keras-applications 1.0.8 py_1
keras-preprocessing 1.1.2 pypi_0 pypi
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 11.2.0 h1234567_1
libprotobuf 3.11.2 hd408876_0
libstdcxx-ng 11.2.0 h1234567_1
linecache2 1.0.0 py_1
markdown 3.1.1 py27_0
mkl 2020.2 256
mkl-service 2.3.0 py27he904b0f_0
mkl_fft 1.0.15 py27ha843d7b_0
mkl_random 1.1.0 py27hd6b4f25_0
mock 3.0.5 py27_0
ncurses 6.3 h5eee18b_3
numpy 1.16.6 pypi_0 pypi
numpy-base 1.16.6 py27hde5b4d6_0
openssl 1.1.1q h7f8727e_0
pip 19.3.1 py27_0
protobuf 3.17.3 pypi_0 pypi
python 2.7.18 ha1903f6_2
readline 8.1.2 h7f8727e_1
scipy 1.2.1 py27h7c811a0_0
sentencepiece 0.1.91 pypi_0 pypi
setuptools 44.0.0 py27_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.2 h5082296_0
tensorboard 1.13.1 py27hf484d3e_0
tensorflow 1.13.1 gpu_py27hcb41dfa_0
tensorflow-estimator 1.13.0 py_0
tensorflow-gpu 1.13.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
traceback2 1.4.0 py27_0
unittest2 1.1.0 py27_0
werkzeug 1.0.1 pyhd3eb1b0_0
wheel 0.37.1 pyhd3eb1b0_0
zlib 1.2.12 h7f8727e_2

PS:I know my problem is similar as in this question: https://stackoverflow.com/questions/43990046/tensorflow-blas-gemm-launch-failed,but it has not been solved there and I'm not sure this question is clear enough or is exactly the same problem as I have so I'm posting it with my own error message. I thought this problem is different of:(https://stackoverflow.com/questions/50911052/tensorflow-matmul-blas-xgemmbatched-launch-failed)..
@zihangdai

@ccutyear
Copy link
Author

This problem has been solved.The program should run on 1080 Ti gpu,not on 3080 Ti.

@xavinatalia
Copy link

Excuse me, I am running train_gpu.py on NVIDIA GeForce RTX 3090 and meet the similar error:
"InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[8,1024,21], b.shape=[8,21,128], m=1024, n=128, k=21, batch_size=8
[[node model/transformer/layer_0/rel_attn_1/einsum_1/MatMul (defined at /home/wxsc/yzy/xlnet-master/modeling.py:369) ]]
[[node model/transformer/StopGradient_4 (defined at /home/wxsc/yzy/xlnet-master/modeling.py:202) ]]"
Are there other solutions other than changing device?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants