Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty db, curious errors, empty output, long gen time, outputs noise #32

Open
baardev opened this issue Nov 4, 2023 · 0 comments
Open

Comments

@baardev
Copy link

baardev commented Nov 4, 2023

I am writing here because the discord invite in the README.md is invalid.

I am not sure I am doing this "right". Using the dataset provided on Google Drive and the prompt "violins playing Tchaikovsky", it takes 10 minutes on an RTX 4070Ti to generate tokens and create a 4-second clip of chaotic humming sounds, and when I make a 30 seconds clip, which takes over an hour to generate tokens, it creates a 3 meg file that sounds like car horns under water :/

Is there a preferred prompt to use with the test data? What sounds were sampled to make the test data?

When I tried to sample my own sounds, after 24 hours, the semantic encoding was less than 10% finished. It is "normal' that it should take 10 days to sample a clip?

Also, using the Google Drive data, and --model_config ./model/musiclm_large_small_context.json I get the errors...

`Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

You are using a model of type mert_model to instantiate a model of type hubert. This is not supported for all configurations of models and can yield errors.

What are the correct settings for using the Google Drive data?

My current command is:

python scripts/infer_top_match.py \
    "violins playing Tchaikovsky" \
    --num_samples 4 \
    --num_top_matches 1 \
    --semantic_path   ./model/semantic.transformer.14000.pt \
    --coarse_path     ./model/coarse.transformer.18000.pt \
    --fine_path       ./model/fine.transformer.24000.pt \
    --rvq_path        ./model/clap.rvq.950_no_fusion.pt \
    --kmeans_path     ./model/kmeans_10s_no_fusion.joblib \
    --model_config    ./model/musiclm_large_small_context.json \
    --duration 4

I had to use the Goggle Drive because the code, while not generating any errors, generated a 0 byte preprocessed.db file in the semantic section, which caused errors in the generation section.

Is there a working example of this code somewhere with proper checkpoints?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant