Skip to content
This repository was archived by the owner on Mar 20, 2026. It is now read-only.
This repository was archived by the owner on Mar 20, 2026. It is now read-only.

--share-all-embeddings requires a joined dictionary #2579

@neel04

Description

@neel04

@edunov @myleott @ngoyal2707 I am trying to train a seq2seq model for translation purposes but am facing a problem when using the GPU for training. This is the command used to do the training:-

CUDA_VISIBLE_DEVICES=0 fairseq-train "/content/drive/My Drive/HashPro/New/" --fp16 --max-sentences 8 --lr 0.02 --clip-norm 0.1  \
  --optimizer sgd --dropout 0.2  \
  --arch bart_large --save-dir "/content/drive/My Drive/HashPro/Checkpoints"

And this is the error:-

2020-09-05 14:11:00 | INFO | fairseq_cli.train | Namespace(activation_fn='gelu', adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.0, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', clip_norm=0.1, cpu=False, criterion='cross_entropy', cross_self_attention=False, curriculum=0, data='/content/drive/My Drive/HashPro/New/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.2, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, localsgd_frequency=3, log_format=None, log_interval=100, lr=[0.02], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=8, max_sentences_valid=8, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, momentum=0.0, no_cross_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_seed_provided=True, no_token_positional_embeddings=False, nprocs_per_node=1, num_batch_buckets=0, num_workers=1, optimizer='sgd', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, relu_dropout=0.0, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='/content/drive/My Drive/HashPro/Checkpoints', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, weight_decay=0.0, zero_sharding='none')
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [input] dictionary: 21936 types
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [output] dictionary: 9216 types
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.input
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.output
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | /content/drive/My Drive/HashPro/New/ valid input-output 1 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/content/fairseq/fairseq_cli/train.py", line 343, in cli_main
    distributed_utils.call_main(args, main)
  File "/content/fairseq/fairseq/distributed_utils.py", line 187, in call_main
    main(args, **kwargs)
  File "/content/fairseq/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/content/fairseq/fairseq/tasks/translation.py", line 279, in build_model
    model = super().build_model(args)
  File "/content/fairseq/fairseq/tasks/fairseq_task.py", line 248, in build_model
    model = models.build_model(args, self)
  File "/content/fairseq/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/content/fairseq/fairseq/models/transformer.py", line 198, in build_model
    raise ValueError("--share-all-embeddings requires a joined dictionary")
ValueError: --share-all-embeddings requires a joined dictionary

From the docs, I can only glean that the "target_dictionary" and the "source_dictionary" is not the same. Apart from that, I could find no help from the internet. Since the error seems to be related to joined dictionaries, it seems that maybe there was a preprocessing step I missed. However, I have scanned all the arguments and they seem to be correct. Even then, here is the command for reference:-


%%bash
fairseq-preprocess --source-lang input --target-lang output \
  --trainpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe --bpe characters --validpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe \
  --destdir /content/drive/'My Drive'/HashPro/New/

Does anybody have any idea on how to fix this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions