-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I fine tune Llama 3 on my own data? #970
Comments
Hi @agutell, great questions! I think both of these use-cases are covered by this tutorial: https://pytorch.org/torchtune/main/tutorials/chat.html. If you have follow-up questions though, please let us know! |
Thank you! I'll check it out. One thing though, that page does not seem to show when you enter the documentation from https://pytorch.org/torchtune/stable/index.html |
So we have a stable version of the documentation that is tied to our code at the v0.1.1 release - this is the one you see immediately if you navigate to the torchtune docs from the web. However, we also update our docs constantly as we contribute more code to the repository. This code lives under the dropdown on the corner of the website under "main". You can also access it through pytorch.org/torchtune/main/index.html. Because we're developing so quickly on torchtune, it's best to check the "main" documentation. |
Ok, thanks! Thats very helpful :) About the earlier message, regarding the custom data. I have now implemented message_converter and custom_dataset. However, the config demands a "dotted path" to the custom_dataset. I am working in a notebook (google colab) and I have not found a way to create a dotted path that the function "_get_component_from_path(path: str)" will accept. I keep getting the error: raise InstantiationError( I there an easier way to get around this problem, instead of trying to define a module and then give the dotted path? It would be preferable if one could just specify the function name "custom_data" in the config, when working from a notebook, and the function "custom_data" is just defined in a cell. Thanks for all your work! :) |
Hi @agutell, this is a good question. I think to have arbitrary functions/modules importable by torchtune from cells is a bit tricky as there is not really a consistent way to define this in a globally unique fashion. We had discussed using a registry, but ultimately found it was harder to scale than the current approach based on E.g. for install run
Then from the file tree you can navigate to
(Or you can do the same thing by modifying whatever config file you're using like this:)
Also tagging @RdoubleA for any thoughts on this. |
Thanks for raising this @agutell, this is an excellent case we hadn't fully considered when designing the configs. It would be ideal to support both in code custom datasets and ad-hoc in notebook code that's not directly importable. What @ebsmothers suggested is a good approach. Actually you don't need to modify the torchtune internals directly (we should prefer user code as an entry point instead of having to modify torchtune internals), you can create a separate .py file in the directory of your notebook with your custom dataset and converters. As long as this is importable in the notebook, it should work as a module path in the config. I'll put up an issue for how to make this easier in a notebook environment. |
Hi,
I have been looking for documentation on how to add my own data set into your config, but without success. If it's possible, could anyone provide a small guide on what needs to be done in order to:
or
Much appreciated!
The text was updated successfully, but these errors were encountered: