Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning a CodeLlama model on CommitPackFT #22

Open
wonhyeongseo opened this issue Sep 7, 2023 · 1 comment
Open

Fine-tuning a CodeLlama model on CommitPackFT #22

wonhyeongseo opened this issue Sep 7, 2023 · 1 comment

Comments

@wonhyeongseo
Copy link

wonhyeongseo commented Sep 7, 2023

Hello, I am a student in Korea working on a 6 week project.

I want to fine-tune a CodeLlama model using your paper's methodology for the Code Repair task. How do you estimate the GPU resources and time required for this project?

I also have two new ideas:

  • Can a static code analyzer's output improve the dataset?
    image
  • Can a RLHF based approach using DPO help the model generate better code?

Thank you for your time and guidance.
Best regards,
Won

@wonhyeongseo wonhyeongseo changed the title Fine-tuning a CodeLlama model on CommitPackFT Fine-tuning a WizardCoder model on CommitPackFT Sep 7, 2023
@wonhyeongseo wonhyeongseo changed the title Fine-tuning a WizardCoder model on CommitPackFT Fine-tuning a CodeLlama model on CommitPackFT Sep 7, 2023
@Muennighoff
Copy link
Collaborator

Sounds exciting!

How do you estimate the GPU resources and time required for this project?

If you go with the 7B model & you also use LoRA like we did for OctoCoder, then I think 1x A100 with 80GB or even 40GB for a few hours may easily suffice. Even for the 13B that may be enough but you may have to use a few memory reduction techniques like gradient checkpointing etc. Maybe you can even fine-tune the 34B one on a single GPU using stuff like QLoRA etc.

Can a static code analyzer's output improve the dataset?

Yes, I think it can. Check out this work where they do that: https://arxiv.org/pdf/2305.18584.pdf

Can a RLHF based approach using DPO help the model generate better code?

Yes, I think so too, check out this work doing something similar: https://arxiv.org/abs/2307.14936
What is the best way to incorporate RLHF / code feedback is still an open & interesting research question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants