Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

going forward + question #28

Open
tig3rmast3r opened this issue Mar 26, 2024 · 7 comments
Open

going forward + question #28

tig3rmast3r opened this issue Mar 26, 2024 · 7 comments

Comments

@tig3rmast3r
Copy link

Hi Hugo, this project rocks i'm having lot of fun, this is the best AI based audio tool available right now, at least for what i'm looking for.

Would like to share where i'm going from this awesome project..
i've made an app in c# that takes care of keeping loops overtime and send generation presets to gradio like you did with unloop, losing hours just listening to generations and i'm making tons of brand new audio loops too!!
working also on a vst plug-in that send generated wavs into daw combined with demucs for separation, made also a very primitive liveset using 3 c# apps simultaneously in realtime and sending demucsed streams to vst -> reaktor and mixing them, what a blast!!
Not sure if there's interest on what i'm doing, i may share the projects but i'm nothing special with coding, i just know how to use chatgpt properly :)

this is the c# app
immagine

this is the liveset setup using vst + bidule + reaktor (+ ipad and midi controller)
immagine

930707408-VID-20231230-WA0001.mp4

immagine
immagine

now for the question, is there a way to start a new training and make a bigger size model ?
is it an easy task ? i mean i have no idea what i have to change in the code to set model size for training like 2x or 3x bigger, i only did (huge) fine-tunings till now. would like to test starting training from scratch and making a bigger model too to see what happens :)

thanks a lot!

@hugofloresgarcia
Copy link
Owner

Hi @tig3rmast3r,

This is insane!! I wanted to say this looks and sounds super cool!
Would love to hear a full AI techno set done with this tool if you'd have one to share!

I'm really happy to see vampnet in a full creative interface like this one.

Training from scratch requires a large dataset (50k hours of audio, more or less), and enough GPUs to fit a batch size of ~32 for the duration of audio context you'd like to train for. You can have a look a the settings used to train the model in conf/vampnet.yml.

@tig3rmast3r
Copy link
Author

tig3rmast3r commented Apr 5, 2024

Hi Hugo,
glad you liked it, i played with that a lot but never found the time to do a good recording, i'm planning to make a youtube video sooner or later.
About the training, i've already carefully selected 10k+ chunks that should be like 28 hours, is a very personal model as it includes most of my discography for the first 30%, i've used it already for this project as fine-tuning.
I did around 300 ephocs for fine-tuning (batch_size*iterations/n. of chunks), with an rtx 4090 it took around 90 hours total (70 coarse+ 20 c2f) i stopped at 300 because the learning rate was dropping very quickly.
i've already had a look to the vampnet.conf, what is the parameter that define the model size ?
is just VampNet.embedding_dim: 1280 ?
i mean if i want to make a double size pth just doubling this value is enough or do i have to adjust something else ?
i guess that as the model is aimed for just techno/tec-house 28 hours may be enough...
with rtx 4090 i can't go over batchsize 5 because with 6 once it saves the first checkpoint it will go over 24gb ram

@hugofloresgarcia
Copy link
Owner

yeah, doubling that value could work. You could also try changing the number of layers and heads, though that might require a bit more finetuning to get it working.

@tig3rmast3r
Copy link
Author

is it normal that training with identical parameters and dataset on linux gives different results than windows?
i'm telling this because i trained a model for a few days and i'm still getting very bad results, i run for 334600 iters with batch 3, that is 100 epochs as i have 10038 chunks.
i used linux with torch.compile with pytorch 2.1.2 and 118, i did the same with c2f.
i used embedding 1914, head 22 and layers 22, while for c2f i lowered to 1800,20,20.
it's still far from being good so i'm wondering if there's someting wrong in linux, like for example is wrongly reading the wav files.
So for testing purpouses i did a quick training with a few chunks and i did the same in windows so the models should be identical and i've discovered that linux ones are usually missing higher frequencies like they are pitched down, i've attached 2 wavs generated with same seed without mask. I will do more tests in windows with longer train but it looks there's something wrong on my linux setup.
How can i make sure it's reading files correctly? they are all wavs mono 16bit pcm 44100hz.
I'm sure Windows is ok cause i did many fine-tunings and they sounds great.
note that i don't use torch.compile in windows as is not available and i'm on pytorch 2.1.0 but i tried even pytorch 2.3.0 in linux with cuda 12.1 and the models appears almost identical to 2.1.2 with 11.8.
do you have any clue ?

here's the test wavs.zip comparison between the linux and windows trainings.

thanks

@tig3rmast3r
Copy link
Author

i def have a problem in linux
did a longer training tonight in windows with same values as linux.
this is only 10 epochs for coarse + 20 for c2f, versus 100 + 100 from linux.
now i got what i was expecting, would like to sort out my linux issue so i can rent a runpod for the training, any idea would be really appreciated. thanks
i attached 2 examples, one pair without mask, same seed, and another pair with mask with another seed.
testwav.zip

@tig3rmast3r
Copy link
Author

tig3rmast3r commented Apr 21, 2024

i did a quick test and looks that the problem is with torch.compile command.
removing torch.compile from train.py solved the problem in linux.
Do you have a specific combination of pytorch and cuda that you have tested with torch.compile and you know it's working ?
i tested so far:
2.1.2 with cu11.8 = bad training
2.3.0 with cu12.1 (dev build) = bad training
2.2.2 with cu12.1 = error (missing 1 required positional argument: 'dim')
hope this helps

EDIT: i did more test and unfortunately the problem is not with torch.compile, while i've noticed that with torch.compile i get different results both results are bad.
i've also rented a runpod istance and run for 1 entire day (6 x RTX4000 ADA, python3.9 without torch.compile and pytorch 2.2.0 cu12.1) and i got same results so it's not related to my config, it looks a general issue with linux, i can print my installed conda and pip if you need more info. thanks

@tig3rmast3r
Copy link
Author

tig3rmast3r commented Apr 25, 2024

i've finally found a working combination, honestly i haven't found the root issue but i can use linux now!
i tried both home and i'm actually training on vast.ai with 4x 4090, no issues so far (not using torch.compile but at least multi gpu is working too)
I got a pip list for pc and tried to make it as much similar as possible in linux
here are all the combinations that works, i have applied the fix on all configs to avoid bad audio results (more info below)

_ python 3.11.4, pytorch 2.0.1, cu11.8 python 3.11.4, pytorch 2.1.2, cu11,8 python 3.9.x or 3.11.4, pythorch 2.2.x, cu11.8 or 12.1 python 3.11.4, pytorch 2.3.0cu12.1 python 3.9.17, pytorch 2.3.0cu12.1 python 3.10.14, pytorch 2.3.0cu12.1 python 3.10.14, pytorch 2.0.1cu11.8
Single GPU working working untested untested working untested working
Single GPU + torch.compile working working error error working untested working
Multi GPU working working untested untested working untested working
Multi GPU + torch.compile incompatible error error error stuck@"starting training loop" working (sometimes) working

i've trained several combinations to understand the impact of torch.compile and python versions on speed and quality
Quality is similar accross all tests as expected (needed to make 50 ephocs to minimize randomness)
Speed report:

_ Windows python 3.11.4, pytorch 2.1.1, cu11.8 no torch.compile Linux python 3.11.4, pytorch 2.1.2, cu11,8 no torch.compile Linux python 3.11.4, pytorch 2.1.2, cu11,8 Linux python 3.9.17, pytorch 2.3.0cu12.1 no torch.compile Linux python 3.9.17, pytorch 2.3.0cu12.1
Speed base 6.6% faster 16% faster 6.1% faster 16.5% faster

i've finally found a working torch.compile config for multi-gpu, using python 3.10 and latest pytorch!! tested on 4 x rtx4090
EDIT: i get errors during startup sometimes with 2.3.0cu12.1, no problems with 2.0.1cu11.8

About the trick to fix bad audio i have attached a zip containing the following file
-file "new" the working combination from pc (edited)
-3 examples, before - requirements - after for a 3.9.17 setup with 2.3.0cu12.1
-a compare.py to create the requirements file based on the windows working file (more info below)

Basically one of the modules inside the requirements file is causing bad audio. i still haven't identified which one.

EDIT: i have updated my fork with installation instructions to get this working on Windows and linux (single and multi gpu) succesfully. i will no longer update this thread and i've removed installation instructions.

i've used the compare trick on clean conda envs both locally and on vast.ai containers.

hopefully you will find the root cause so we can define the version during pip install -e ./vampnet (or update the code to work with the latest version of whatever it is)

Here's the zip files.zip

Lastly, while i was testing i've found the time to report some "TimeToTrain" values, it may help finding the perfect server to train and save some (or a lot) of $$
Here are all the tested server, i used mostly runpod.io
i've switched now to vast.ai as is much cheaper in most cases
TimeToTrain is based on n. of Ephocs, so larger batch size have fewer iteractions.
basically (batch_size x iteractions) is equal for all tests

model ram cuda tensor freq tdp gpu batch time to train fp32bench (single GPU) vastai tflops vastai dlp min vastai dlp max NOTE
RTX 4090 24 16384 512 2235 450 4 16 178 82,58 327,00 260,00 350,00  cheap server, may be better
H100 80 smx5 80 16896 528 1590 700 1 16 224 66,91 107,00 580,00 684,00  
RTX 4000 ADA 20 6144 192 1500 130 6 18 241 26,73        
RTX 4090 24 16384 512 2235 450 2 6 264 82,58 162,00 175,00 212,00  
L40 48 18176 568 735 300 2 12 290 90,52 144,00 231,00 231,00  
RTX A4000 16 6144 192 735 140 8 16 295 19,17        
RTX 4090 24 16384 512 2235 450 1 4 379 82,58 My home pc (11700k@5Ghz)
RTX 6000 ADA 48 18176 568 915 300 1 6 450 91,06 81,00 135,00 135,00 Multi-gpu not working
RTX 4000 ADA sff 20 6144 192 720 70 4 8 513 19,17 41,00 31,00 42,00  
RTX A5000 (SFF?) 24 6144 192 900 150 6 18 545 19,35       strange model with 150w tdp and lower perfs
RTX A5000 24 8192 256 1170 230 2 6 572 27,77 55,00 55,00 69,00  
A100 80 PCIe 80 6192 432 1065 300 1 16 600 19,49 31,00 170,00 260,00  strangely low, probably cpu bound
RTX 3090 24 10496 328 1395 350 2 6 930 35,58 71,00 75,00 90,00 low performance node, need retest
Tesla V100 PCIe 16 5120 640 937 250 6 12 1420 16,32 25,00 34,00 40,00 uses only 20% TDP ??

EDIT May 9: updated infos and zipped file, will update more as soon as i have more info.
EDIT May20: more info, removed installation instructions (there's a quickinstall.sh bash on my fork)
Hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants