-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to save S4 decoder with mode=nplr #105
Comments
Can you be more specific about the way you're saving the checkpoints? Do you have a custom train loop, or are you running our training script? If the latter, can you provide more details about the config you're using One thing that stands out is that your |
I'm not running your training script. I'm new at using pytorch lightning so I'm using this template to learn it (modifyed for my model and the generative task). I use default lightning callback ModelCheckpoint to save the checkpoints during evaluation with the following config: model_checkpoint:
_target_: lightning.pytorch.callbacks.ModelCheckpoint
dirpath: ${paths.output_dir}/checkpoints
filename: "epoch_{epoch:03d}"
monitor: "val/loss"
save_last: True
save_top_k: 1
mode: "min"
auto_insert_metric_name: False
save_on_train_epoch_end: False I've also tryed different environment configuration, like the following example:
obtaining the same |
Unfortunately I haven't seen this problem in a while and it's hard for me to debug without more details. I do think I've seen related things before; IIRC there might be something going on in the DPLR kernel because of several linear algebra conversions involved when constructing it which might cause issues in edge cases (e.g. more advanced usages when needing to convert the model to different forms and do something different at inference time). In vanilla training settings it should be fine though. This is the best advice I can give for now:
|
Hi, I'm implementing a decoder for audio generation (DDSP-style) using standalone S4 (V3).
I'd like to save checkpoints during the training and eventually the final model.
When training the model with S4D configuration (
mode=diag
) everything works well.Instead, training the model with standard S4 configuration (
mode=nplr
) I get the following error:RuntimeError: Cannot save multiple tensors or storages that view the same data as different types
.Using CUDA extension for Cauchy and/or pykeops doesn't make a different.
I'm searching for a solution. Thanks in advance.
I'm on Ubuntu 18.04.4 LTS and this is my environment:
And this is the train.log I obtained:
The text was updated successfully, but these errors were encountered: