Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TracerWarning while translating but no error showing #1101

Open
chrkell opened this issue Oct 26, 2023 · 1 comment
Open

TracerWarning while translating but no error showing #1101

chrkell opened this issue Oct 26, 2023 · 1 comment

Comments

@chrkell
Copy link

chrkell commented Oct 26, 2023

After running python -m sockeye.translate --config small_model/args.yaml --input sentence_parallel_files/src_test.txt --output sentence_parallel_files/tgt_test.txt --models small_model --strip-unknown-words --prevent-unk I encounter a TracerWarning, then the program is killed without translation outputs, although the sockeye_translations.log does not show any errors either.

The output in the terminal:
[INFO:sockeye.utils] Sockeye: 3.1.34, commit 4c30942ddb523533bccb4d2cbb3e894e45b1db93, path /Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/sockeye/__init__.py [INFO:sockeye.utils] PyTorch: 1.13.1 (/Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/torch/__init__.py) [INFO:sockeye.utils] Command: /Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/sockeye/translate.py --config small_model/args.yaml --input sentence_parallel_files/src_test.txt --output sentence_parallel_files/tgt_test.txt --models small_model --strip-unknown-words --prevent-unk [INFO:sockeye.utils] Arguments: Namespace(allow_missing_params=False, amp=False, apex_amp=False, batch_sentences_multiple_of=8, batch_size=4096, batch_type='word', beam_search_stop='all', beam_size=5, bow_task_pos_weight=10, bow_task_weight=1.0, brevity_penalty_constant_length_ratio=0.0, brevity_penalty_type='none', brevity_penalty_weight=1.0, bucket_scaling=False, bucket_width=8, cache_last_best_params=0, cache_metric='perplexity', cache_strategy='best', checkpoint_improvement_threshold=0.0, checkpoint_interval=10, checkpoints=None, chunk_size=None, clamp_to_dtype=False, config='small_model/args.yaml', decode_and_evaluate=500, decoder='transformer', deepspeed_bf16=False, deepspeed_fp16=False, device_id=0, dist=False, dry_run=False, dtype='float32', embed_dropout=[0.3, 0.3], encoder='transformer', end_of_prepending_tag=None, ensemble_mode='linear', env=None, fixed_param_names=[], fixed_param_strategy=None, gradient_clipping_threshold=1.0, gradient_clipping_type='none', greedy=False, ignore_extra_params=False, initial_learning_rate=0.0002, input='sentence_parallel_files/src_test.txt', input_factors=None, json_input=False, keep_initializations=False, keep_last_params=-1, knn_index=None, knn_lambda=0.8, label_smoothing=0.3, label_smoothing_impl='mxnet', learning_rate_reduce_factor=0.9, learning_rate_reduce_num_not_improved=8, learning_rate_scheduler_type='plateau-reduce', learning_rate_warmup=0, length_penalty_alpha=1.0, length_penalty_beta=0.0, length_task=None, length_task_layers=1, length_task_weight=1.0, lhuc=None, local_rank=None, loglevel='INFO', loglevel_secondary_workers='INFO', max_checkpoints=None, max_input_length=None, max_num_checkpoint_not_improved=None, max_num_epochs=None, max_output_length=None, max_output_length_num_stds=2, max_samples=10000000, max_seconds=None, max_seq_len=[95, 95], max_updates=None, min_num_epochs=None, min_samples=None, min_updates=None, models=['small_model'], momentum=0.0, nbest_size=1, neural_vocab_selection=None, neural_vocab_selection_block_loss=False, no_bucketing=False, no_logfile=False, no_reload_on_learning_rate_reduce=False, num_embed=[None, None], num_layers=[3, 3], num_words=[20000, 20000], nvs_thresh=0.5, optimized_metric='bleu', optimizer='adam', optimizer_betas=[0.9, 0.999], optimizer_eps=1e-08, output='sentence_parallel_files/tgt_test.txt', output_type='translation', overwrite_output=False, pad_vocab_to_multiple_of=8, params=None, prepared_data=None, prevent_unk=True, quiet=False, quiet_secondary_workers=False, restrict_lexicon=None, restrict_lexicon_topk=None, sample=None, seed=1, shared_vocab=True, skip_nvs=False, source='../sentence_parallel_files/src_train.txt', source_factor_vocabs=[], source_factors=[], source_factors_combine=[], source_factors_num_embed=[], source_factors_share_embedding=[], source_factors_use_source_vocab=[], source_vocab=None, stop_training_on_decoder_failure=False, strip_unknown_words=True, target='../sentence_parallel_files/tgt_train.txt', target_factor_vocabs=[], target_factors=[], target_factors_combine=[], target_factors_num_embed=[], target_factors_share_embedding=[], target_factors_use_target_vocab=[], target_factors_weight=[1.0], target_vocab=None, tf32=True, transformer_activation_type=['relu', 'relu'], transformer_attention_heads=[4, 4], transformer_block_prepended_cross_attention=False, transformer_dropout_act=[0.1, 0.1], transformer_dropout_attention=[0.1, 0.1], transformer_dropout_prepost=[0.1, 0.1], transformer_feed_forward_num_hidden=[512, 512], transformer_feed_forward_use_glu=False, transformer_model_size=[128, 128], transformer_positional_embedding_type='fixed', transformer_postprocess=['dr', 'dr'], transformer_preprocess=['n', 'n'], update_interval=1, use_cpu=False, validation_source='../sentence_parallel_files/src_validation.txt', validation_source_factors=[], validation_target='../sentence_parallel_files/tgt_validation.txt', validation_target_factors=[], weight_decay=0.0, weight_tying_type='src_trg_softmax', word_min_count=[1, 1]) [INFO:sockeye.utils] CUDA not available, defaulting to CPU device [INFO:__main__] Translate Device: cpu [INFO:sockeye.model] Loading 1 model(s) from ['small_model'] ... [INFO:sockeye.vocab] Vocabulary (20008 words) loaded from "small_model/vocab.src.0.json" [INFO:sockeye.vocab] Vocabulary (20008 words) loaded from "small_model/vocab.trg.0.json" [INFO:sockeye.model] Model version: 3.1.34 [INFO:sockeye.model] Loaded model config from "small_model/config" [INFO:sockeye.model] Disabling dropout layers for performance reasons [INFO:sockeye.model] ModelConfig(config_data=DataConfig(data_statistics=DataStatistics(num_sents=16245, num_discarded=1, num_tokens_source=317343, num_tokens_target=177556, num_unks_source=20594, num_unks_target=6918, max_observed_len_source=83, max_observed_len_target=80, size_vocab_source=20008, size_vocab_target=20008, length_ratio_mean=0.7195621962140187, length_ratio_std=0.5842629746461441, buckets=[(8, 8), (16, 16), (24, 24), (32, 32), (40, 40), (48, 48), (56, 56), (64, 64), (72, 72), (80, 80), (88, 88), (96, 96)], num_sents_per_bucket=[888, 5731, 5333, 2625, 1170, 381, 80, 13, 5, 3, 16, 0], average_len_target_per_bucket=[6.146396396396393, 9.93160006979582, 11.494468404275306, 12.341714285714282, 12.621367521367512, 13.341207349081374, 13.412499999999998, 16.846153846153847, 18.8, 35.666666666666664, 8.75, None], length_ratio_stats_per_bucket=[(1.3744302338052337, 0.8585728161087813), (0.949670248710557, 0.6720002114978064), (0.6097747334248025, 0.4043806126550977), (0.4536296476514696, 0.2288022427671748), (0.36131486260088264, 0.20559448455653748), (0.3143034722464711, 0.19507350713250618), (0.2827698400110712, 0.2829753677267691), (0.642658803608251, 1.160508093478276), (0.27178585119143306, 0.13837005297982366), (2.3391053391053394, 3.0600525538493457), (0.10542168674698794, 0.0822129160479288), (None, None)]), max_seq_len_source=96, max_seq_len_target=96, num_source_factors=1, num_target_factors=1, eop_id=-1), vocab_source_size=20008, vocab_target_size=20008, config_embed_source=EmbeddingConfig(vocab_size=20008, num_embed=128, dropout=0.0, num_factors=1, factor_configs=None, allow_sparse_grad=False), config_embed_target=EmbeddingConfig(vocab_size=20008, num_embed=128, dropout=0.0, num_factors=1, factor_configs=None, allow_sparse_grad=False), config_encoder=TransformerConfig(model_size=128, attention_heads=4, feed_forward_num_hidden=512, act_type='relu', num_layers=3, dropout_attention=0.0, dropout_act=0.0, dropout_prepost=0.0, positional_embedding_type='fixed', preprocess_sequence='n', postprocess_sequence='dr', max_seq_len_source=96, max_seq_len_target=96, decoder_type='transformer', block_prepended_cross_attention=False, use_lhuc=False, depth_key_value=128, use_glu=False), config_decoder=TransformerConfig(model_size=128, attention_heads=4, feed_forward_num_hidden=512, act_type='relu', num_layers=3, dropout_attention=0.0, dropout_act=0.0, dropout_prepost=0.0, positional_embedding_type='fixed', preprocess_sequence='n', postprocess_sequence='dr', max_seq_len_source=96, max_seq_len_target=96, decoder_type='transformer', block_prepended_cross_attention=False, use_lhuc=False, depth_key_value=128, use_glu=False), config_length_task=None, weight_tying_type='src_trg_softmax', lhuc=False, dtype='float32', neural_vocab_selection=None, neural_vocab_selection_block_loss=False) [INFO:sockeye.model] Loaded params from "small_model/params.best" to "cpu" [INFO:sockeye.model] Model dtype: overridden to float32 [INFO:sockeye.model] 1 model(s) loaded in 0.2285s [INFO:sockeye.inference] Translator (1 model(s) beam_size=5 algorithm=BeamSearch, beam_search_stop=all max_input_length=95 nbest_size=1 ensemble_mode=None max_batch_size=4096 dtype=torch.float32 skip_nvs=False nvs_thresh=0.5) [INFO:__main__] Translating... /Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/torch/jit/_trace.py:983: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tupleinstead. fordict, use a NamedTupleinstead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. argument_names, zsh: killed python -m sockeye.translate --config small_model/args.yaml --input --output

sockeye_translations.log:
[2023-10-25:20:06:15:INFO:sockeye.utils:log_sockeye_version] Sockeye: 3.1.34, commit 4c30942ddb523533bccb4d2cbb3e894e45b1db93, path /Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/sockeye/__init__.py [2023-10-25:20:06:15:INFO:sockeye.utils:log_torch_version] PyTorch: 1.13.1 (/Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/torch/__init__.py) [2023-10-25:20:06:15:INFO:sockeye.utils:log_basic_info] Command: /Users/christyman/miniconda3/envs/ats-program-37/lib/python3.7/site-packages/sockeye/translate.py --config ../small_model/args.yaml --input ../sentence_parallel_files/src_test.txt --output ../sockeye_translations --models ../small_model --strip-unknown-words --prevent-unk [2023-10-25:20:06:15:INFO:sockeye.utils:log_basic_info] Arguments: Namespace(allow_missing_params=False, amp=False, apex_amp=False, batch_sentences_multiple_of=8, batch_size=4096, batch_type='word', beam_search_stop='all', beam_size=5, bow_task_pos_weight=10, bow_task_weight=1.0, brevity_penalty_constant_length_ratio=0.0, brevity_penalty_type='none', brevity_penalty_weight=1.0, bucket_scaling=False, bucket_width=8, cache_last_best_params=0, cache_metric='perplexity', cache_strategy='best', checkpoint_improvement_threshold=0.0, checkpoint_interval=10, checkpoints=None, chunk_size=None, clamp_to_dtype=False, config='../small_model/args.yaml', decode_and_evaluate=500, decoder='transformer', deepspeed_bf16=False, deepspeed_fp16=False, device_id=0, dist=False, dry_run=False, dtype='float32', embed_dropout=[0.3, 0.3], encoder='transformer', end_of_prepending_tag=None, ensemble_mode='linear', env=None, fixed_param_names=[], fixed_param_strategy=None, gradient_clipping_threshold=1.0, gradient_clipping_type='none', greedy=False, ignore_extra_params=False, initial_learning_rate=0.0002, input='../sentence_parallel_files/src_test.txt', input_factors=None, json_input=False, keep_initializations=False, keep_last_params=-1, knn_index=None, knn_lambda=0.8, label_smoothing=0.3, label_smoothing_impl='mxnet', learning_rate_reduce_factor=0.9, learning_rate_reduce_num_not_improved=8, learning_rate_scheduler_type='plateau-reduce', learning_rate_warmup=0, length_penalty_alpha=1.0, length_penalty_beta=0.0, length_task=None, length_task_layers=1, length_task_weight=1.0, lhuc=None, local_rank=None, loglevel='INFO', loglevel_secondary_workers='INFO', max_checkpoints=None, max_input_length=None, max_num_checkpoint_not_improved=None, max_num_epochs=None, max_output_length=None, max_output_length_num_stds=2, max_samples=10000000, max_seconds=None, max_seq_len=[95, 95], max_updates=None, min_num_epochs=None, min_samples=None, min_updates=None, models=['../small_model'], momentum=0.0, nbest_size=1, neural_vocab_selection=None, neural_vocab_selection_block_loss=False, no_bucketing=False, no_logfile=False, no_reload_on_learning_rate_reduce=False, num_embed=[None, None], num_layers=[3, 3], num_words=[20000, 20000], nvs_thresh=0.5, optimized_metric='bleu', optimizer='adam', optimizer_betas=[0.9, 0.999], optimizer_eps=1e-08, output='../sockeye_translations', output_type='translation', overwrite_output=False, pad_vocab_to_multiple_of=8, params=None, prepared_data=None, prevent_unk=True, quiet=False, quiet_secondary_workers=False, restrict_lexicon=None, restrict_lexicon_topk=None, sample=None, seed=1, shared_vocab=True, skip_nvs=False, source='../sentence_parallel_files/src_train.txt', source_factor_vocabs=[], source_factors=[], source_factors_combine=[], source_factors_num_embed=[], source_factors_share_embedding=[], source_factors_use_source_vocab=[], source_vocab=None, stop_training_on_decoder_failure=False, strip_unknown_words=True, target='../sentence_parallel_files/tgt_train.txt', target_factor_vocabs=[], target_factors=[], target_factors_combine=[], target_factors_num_embed=[], target_factors_share_embedding=[], target_factors_use_target_vocab=[], target_factors_weight=[1.0], target_vocab=None, tf32=True, transformer_activation_type=['relu', 'relu'], transformer_attention_heads=[4, 4], transformer_block_prepended_cross_attention=False, transformer_dropout_act=[0.1, 0.1], transformer_dropout_attention=[0.1, 0.1], transformer_dropout_prepost=[0.1, 0.1], transformer_feed_forward_num_hidden=[512, 512], transformer_feed_forward_use_glu=False, transformer_model_size=[128, 128], transformer_positional_embedding_type='fixed', transformer_postprocess=['dr', 'dr'], transformer_preprocess=['n', 'n'], update_interval=1, use_cpu=False, validation_source='../sentence_parallel_files/src_validation.txt', validation_source_factors=[], validation_target='../sentence_parallel_files/tgt_validation.txt', validation_target_factors=[], weight_decay=0.0, weight_tying_type='src_trg_softmax', word_min_count=[1, 1]) [2023-10-25:20:06:15:INFO:sockeye.utils:init_device] CUDA not available, defaulting to CPU device [2023-10-25:20:06:15:INFO:__main__:run_translate] Translate Device: cpu [2023-10-25:20:06:15:INFO:sockeye.model:load_models] Loading 1 model(s) from ['../small_model'] ... [2023-10-25:20:06:15:INFO:sockeye.vocab:vocab_from_json] Vocabulary (20008 words) loaded from "../small_model/vocab.src.0.json" [2023-10-25:20:06:15:INFO:sockeye.vocab:vocab_from_json] Vocabulary (20008 words) loaded from "../small_model/vocab.trg.0.json" [2023-10-25:20:06:15:INFO:sockeye.model:load_model] Model version: 3.1.34 [2023-10-25:20:06:15:INFO:sockeye.model:load_config] Loaded model config from "../small_model/config" [2023-10-25:20:06:15:INFO:sockeye.model:load_model] Disabling dropout layers for performance reasons [2023-10-25:20:06:15:INFO:sockeye.model:__init__] ModelConfig(config_data=DataConfig(data_statistics=DataStatistics(num_sents=16245, num_discarded=1, num_tokens_source=317343, num_tokens_target=177556, num_unks_source=20594, num_unks_target=6918, max_observed_len_source=83, max_observed_len_target=80, size_vocab_source=20008, size_vocab_target=20008, length_ratio_mean=0.7195621962140187, length_ratio_std=0.5842629746461441, buckets=[(8, 8), (16, 16), (24, 24), (32, 32), (40, 40), (48, 48), (56, 56), (64, 64), (72, 72), (80, 80), (88, 88), (96, 96)], num_sents_per_bucket=[888, 5731, 5333, 2625, 1170, 381, 80, 13, 5, 3, 16, 0], average_len_target_per_bucket=[6.146396396396393, 9.93160006979582, 11.494468404275306, 12.341714285714282, 12.621367521367512, 13.341207349081374, 13.412499999999998, 16.846153846153847, 18.8, 35.666666666666664, 8.75, None], length_ratio_stats_per_bucket=[(1.3744302338052337, 0.8585728161087813), (0.949670248710557, 0.6720002114978064), (0.6097747334248025, 0.4043806126550977), (0.4536296476514696, 0.2288022427671748), (0.36131486260088264, 0.20559448455653748), (0.3143034722464711, 0.19507350713250618), (0.2827698400110712, 0.2829753677267691), (0.642658803608251, 1.160508093478276), (0.27178585119143306, 0.13837005297982366), (2.3391053391053394, 3.0600525538493457), (0.10542168674698794, 0.0822129160479288), (None, None)]), max_seq_len_source=96, max_seq_len_target=96, num_source_factors=1, num_target_factors=1, eop_id=-1), vocab_source_size=20008, vocab_target_size=20008, config_embed_source=EmbeddingConfig(vocab_size=20008, num_embed=128, dropout=0.0, num_factors=1, factor_configs=None, allow_sparse_grad=False), config_embed_target=EmbeddingConfig(vocab_size=20008, num_embed=128, dropout=0.0, num_factors=1, factor_configs=None, allow_sparse_grad=False), config_encoder=TransformerConfig(model_size=128, attention_heads=4, feed_forward_num_hidden=512, act_type='relu', num_layers=3, dropout_attention=0.0, dropout_act=0.0, dropout_prepost=0.0, positional_embedding_type='fixed', preprocess_sequence='n', postprocess_sequence='dr', max_seq_len_source=96, max_seq_len_target=96, decoder_type='transformer', block_prepended_cross_attention=False, use_lhuc=False, depth_key_value=128, use_glu=False), config_decoder=TransformerConfig(model_size=128, attention_heads=4, feed_forward_num_hidden=512, act_type='relu', num_layers=3, dropout_attention=0.0, dropout_act=0.0, dropout_prepost=0.0, positional_embedding_type='fixed', preprocess_sequence='n', postprocess_sequence='dr', max_seq_len_source=96, max_seq_len_target=96, decoder_type='transformer', block_prepended_cross_attention=False, use_lhuc=False, depth_key_value=128, use_glu=False), config_length_task=None, weight_tying_type='src_trg_softmax', lhuc=False, dtype='float32', neural_vocab_selection=None, neural_vocab_selection_block_loss=False) [2023-10-25:20:06:15:INFO:sockeye.model:load_parameters] Loaded params from "../small_model/params.best" to "cpu" [2023-10-25:20:06:15:INFO:sockeye.model:load_model] Model dtype: overridden to float32 [2023-10-25:20:06:15:INFO:sockeye.model:load_models] 1 model(s) loaded in 0.2356s [2023-10-25:20:06:15:INFO:sockeye.inference:__init__] Translator (1 model(s) beam_size=5 algorithm=BeamSearch, beam_search_stop=all max_input_length=95 nbest_size=1 ensemble_mode=None max_batch_size=4096 dtype=torch.float32 skip_nvs=False nvs_thresh=0.5) [2023-10-25:20:06:15:INFO:__main__:read_and_translate] Translating...

@mjdenkowski
Copy link
Contributor

It looks like the process was automatically killed (zsh: killed). Can you try running without --config small_model/args.yaml? We typically don't specify the full set of training arguments during inference--model information is pulled from its config file.

In this case, it looks like the training batch size (4096) is being used during inference. Inference batching is sequence-based, so this would mean batches of 4096 sequences (input lines), which would require much larger memory than batch sizes of 1, 32, or 64. This may be the reason the process was killed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants