Command to reproduce results on MuST-C failing

Running the command provided in the readme to reproduce the results on MuST-C of the paper "Adapting Transformer to End-to-End Spoken Language Translation" results in the following error:

```
| distributed init (rank 1): tcp://localhost:18735
| distributed init (rank 0): tcp://localhost:18735
| distributed init (rank 2): tcp://localhost:18735
| distributed init (rank 3): tcp://localhost:18735
Namespace(adam_betas='(0.9, 0.999)', adam_eps=1e-08, arch='speechconvtransformer_big', attention_dropout=0.1, attn_2d=True, audio_input=True, bucket_cap_mb=150, clip_norm=20.0, criterion='label_smoothed_cross_entropy', data=['bin/'], ddp_backend='no_c10d', decoder_attention_heads=8, decoder_embed_dim=512, decoder_ffn_embed_dim=1024, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, decoder_out_embed_dim=512, decoder_output_dim=512, device_id=0, distance_penalty='gauss', distributed_backend='nccl', distributed_init_host='localhost', distributed_init_method='tcp://localhost:18735', distributed_port=18736, distributed_rank=0, distributed_world_size=4, dropout=0.1, encoder_attention_heads=8, encoder_convolutions='[(64, 3, 3)] * 2', encoder_embed_dim=512, encoder_ffn_embed_dim=1024, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fix_batches_to_gpus=False, fp16=False, fp16_init_scale=128, fp16_scale_window=None, freeze_encoder=False, init_variance=1.0, keep_interval_updates=-1, label_smoothing=0.1, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=1000, lr=[0.005], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=100, max_sentences=8, max_sentences_valid=8, max_source_positions=1400, max_target_positions=300, max_tokens=12000, max_update=0, min_loss_scale=0.0001, min_lr=1e-08, momentum=0.99, no_attn_2d=False, no_cache_source=False, no_epoch_checkpoints=False, no_progress_bar=False, no_save=False, normalization_constant=1.0, optimizer='adam', optimizer_overrides='{}', raw_text=False, relu_dropout=0.1, reset_lr_scheduler=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='models', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=True, skip_invalid_size_inputs_valid_test=True, source_lang=None, target_lang=None, task='translation', train_subset='train', update_freq=[16], upsample_primary=1, valid_subset='valid', validate_interval=1, warmup_init_lr=0.0003, warmup_updates=4000, weight_decay=0.0)
| [h5] dictionary: 4 types
| [de] dictionary: 192 types
| bin/ train 229703 examples
| bin/ valid 1423 examples
Exception ignored in: <function IndexedDataset.__del__ at 0x7f0de0de5790>
Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
Traceback (most recent call last):
  File "../../train.py", line 365, in <module>
Exception ignored in: <function IndexedDataset.__del__ at 0x7f9f0b8f3790>
Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/indexed_dataset.py", line 85, in __del__
    def __del__(self):
KeyboardInterrupt: 
    multiprocessing_main(args)
    def __del__(self):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 42, in main
KeyboardInterrupt: 
    p.join()
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/amit/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 84, in signal_handler
    raise Exception(msg)
Exception: 

-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/multiprocessing_train.py", line 48, in run
    single_process_main(args)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/train.py", line 53, in main
    dummy_batch = task.dataset('train').get_dummy_batch(args.max_tokens, max_positions)
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 221, in get_dummy_batch
    return self.collater([
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 224, in <listcomp>
    'source': self.src_dict.dummy_sentence(src_len) if self.src_dict is not None else None,
  File "/home/amit/amit/pruning/FBK-Fairseq-ST/fairseq/data/dictionary.py", line 302, in dummy_sentence
    t = torch.Tensor(length).new_empty((length, self.audio_features)).uniform_(self.nspecial + 1, len(self))
RuntimeError: Expected a_in <= b_in to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
```
I've tried running the command with Python 3.5 and Python 3.8 and I get the same error both times
I believe the error is caused because the parameters being passed to torch::nn::init::uniform_ are incorrect.

I tried fixing the error myself by changing `self.nspecial + 1` to `self.nspecial` in the following line https://github.com/mattiadg/FBK-Fairseq-ST/blob/2d152404df1ffce944d6bc11f3fb8361fb4810f7/fairseq/data/dictionary.py#L302

Is this a valid fix?

Thanks in advance,
Chaitanya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command to reproduce results on MuST-C failing #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Command to reproduce results on MuST-C failing #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions