We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
根据文档编译出deeprec和estimator,启动ps训练core dump
INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'chief': ['127.0.0.1:2222'], 'ps': ['127.0.0.1:2223'], 'worker': ['127.0.0.1:2224']}, 'task': {'index': 0, 'type': 'ps'}} INFO:tensorflow:Using config: {'_model_dir': 'easyrec_deepfm', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': device_filters: "/job:ps" gpu_options { } allow_soft_placement: true , '_keep_checkpoint_max': 10, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb0381fddd8>, '_task_type': 'ps', '_task_id': 0, '_evaluation_master': '', '_master': 'grpc://127.0.0.1:2223', '_num_ps_replicas': 1, '_num_worker_replicas': 2, '_global_id_in_cluster': 2, '_is_chief': False} I0710 09:41:36.169108 140395967924032 input.py:45] check_mode: False I0710 09:41:36.169946 140395967924032 input.py:45] check_mode: False I0710 09:41:36.170253 140395967924032 main.py:173] will use BestExporter, metric is auc, the bigger the better: 1 I0710 09:41:36.171174 140395967924032 input.py:45] check_mode: False INFO:tensorflow:Not using Distribute Coordinator. INFO:tensorflow:Start Tensorflow server. terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid Fatal Python error: Aborted Thread 0x00007fb07bcbc740 (most recent call first): File "/home/pai/lib/python3.6/site-packages/tensorflow_core/python/training/server_lib.py", line 184 in join File "/home/pai/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 688 in run_ps File "/home/pai/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 640 in run File "/home/pai/lib/python3.6/site-packages/easy_rec/python/compat/estimator_train.py", line 84 in train_and_evaluate File "/home/pai/lib/python3.6/site-packages/easy_rec/python/main.py", line 333 in _train_and_evaluate_impl File "/home/pai/pyml/src/main.py", line 180 in easyrec_main File "/home/pai/lib/python3.6/site-packages/absl/app.py", line 258 in _run_main File "/home/pai/lib/python3.6/site-packages/absl/app.py", line 312 in run File "/home/pai/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40 in run File "/home/pai/pyml/src/main.py", line 196 in <module> File "/home/pai/lib/python3.6/runpy.py", line 85 in _run_code File "/home/pai/lib/python3.6/runpy.py", line 193 in _run_module_as_main ./ps.sh: line 8: 308 Aborted (core dumped)
gdb信息
gdb python core GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from python...done. warning: core file may not match specified executable file. [New LWP 338] [New LWP 308] [New LWP 315] [New LWP 318] [New LWP 319] [New LWP 320] [New LWP 321] [New LWP 324] [New LWP 326] [New LWP 341] [New LWP 342] [New LWP 346] [New LWP 347] [New LWP 348] [New LWP 349] [New LWP 350] [New LWP 356] [New LWP 343] [New LWP 339] [New LWP 340] [New LWP 333] [New LWP 332] [New LWP 331] [New LWP 330] [New LWP 329] [New LWP 328] [New LWP 327] [New LWP 325] [New LWP 323] [New LWP 322] [New LWP 317] [New LWP 316] [New LWP 314] [New LWP 312] [New LWP 334] [New LWP 335] [New LWP 336] [New LWP 337] [New LWP 344] [New LWP 345] [New LWP 353] [New LWP 358] [New LWP 361] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `python -m main --algo_lib easyrec --continue_train True --pipeline_config_path'. Program terminated with signal SIGABRT, Aborted. #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7fafeeff5700 (LWP 338))] (gdb) bt #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 <signal handler called> #2 0x00007fb07b4d0fb7 in __GI___libc_sigaction (sig=2, act=0x7fafeeff3ff0, oact=0x0) at ../sysdeps/unix/sysv/linux/x86_64/sigaction.c:54 #3 0x0000000000000000 in ?? () (gdb)
The text was updated successfully, but these errors were encountered:
试了下_train_distribute为None就会core dump,但是原生tf1.15不会,能否fix兼容下
Sorry, something went wrong.
No branches or pull requests
根据文档编译出deeprec和estimator,启动ps训练core dump
gdb信息
The text was updated successfully, but these errors were encountered: