Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to save S4 decoder with mode=nplr #105

Open
gregogiudici opened this issue May 28, 2023 · 3 comments
Open

Unable to save S4 decoder with mode=nplr #105

gregogiudici opened this issue May 28, 2023 · 3 comments

Comments

@gregogiudici
Copy link

gregogiudici commented May 28, 2023

Hi, I'm implementing a decoder for audio generation (DDSP-style) using standalone S4 (V3).
I'd like to save checkpoints during the training and eventually the final model.
When training the model with S4D configuration (mode=diag) everything works well.

Instead, training the model with standard S4 configuration (mode=nplr) I get the following error:
RuntimeError: Cannot save multiple tensors or storages that view the same data as different types.

Using CUDA extension for Cauchy and/or pykeops doesn't make a different.

I'm searching for a solution. Thanks in advance.

I'm on Ubuntu 18.04.4 LTS and this is my environment:

python = 3.9.16
torch = 2.0.1
torchaudio = 0.13.1
pytorch-cuda = 11.6
pytorch-lightning = 1.9.3
lightning = 2.0.2
hydra-core = 1.3.2

And this is the train.log I obtained:
image

@albertfgu
Copy link
Contributor

Can you be more specific about the way you're saving the checkpoints? Do you have a custom train loop, or are you running our training script? If the latter, can you provide more details about the config you're using

One thing that stands out is that your torch and torchaudio versions don't seem compatible. torchaudio=0.13.1 should be used with PyTorch 1.13 instead of 2.0: https://pytorch.org/get-started/previous-versions/

@gregogiudici
Copy link
Author

gregogiudici commented Jun 2, 2023

I'm not running your training script. I'm new at using pytorch lightning so I'm using this template to learn it (modifyed for my model and the generative task).

I use default lightning callback ModelCheckpoint to save the checkpoints during evaluation with the following config:

 model_checkpoint:
  _target_: lightning.pytorch.callbacks.ModelCheckpoint
  dirpath: ${paths.output_dir}/checkpoints
  filename: "epoch_{epoch:03d}"
  monitor: "val/loss"
  save_last: True
  save_top_k: 1 
  mode: "min" 
  auto_insert_metric_name: False 
  save_on_train_epoch_end: False 

I've also tryed different environment configuration, like the following example:

python = 3.9.16
torch = 1.13.1
torchaudio = 0.13.1
pytorch-cuda = 11.6
pytorch-lightning = 1.5.10
lightning = 2.0.2
hydra-core = 1.3.2

obtaining the same RunTimeError all the times

@albertfgu
Copy link
Contributor

Unfortunately I haven't seen this problem in a while and it's hard for me to debug without more details. I do think I've seen related things before; IIRC there might be something going on in the DPLR kernel because of several linear algebra conversions involved when constructing it which might cause issues in edge cases (e.g. more advanced usages when needing to convert the model to different forms and do something different at inference time). In vanilla training settings it should be fine though.

This is the best advice I can give for now:

  • Double check that my train loop works; e.g. python -m train wandb=null should be saving checkpoints every epoch
  • See if there are any discrepancies between your ModelCheckpoint and the way it's done in this repo
  • More broadly, I would just recommend using S4D. The default version (mode=diag init=diag-legs) should be very close to the full S4-DPLR model in general, especially if you're using adequate learning rate warmup. The version S4D-Lin (mode=diag init=diag-lin) should also work well usually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants