Fine-tuned GPT2 Model on Code using HuggingFace Transformers

This repository contains code for training and generating inferences from a fine-tuned GPT2 model. The model is specifically trained on code and natural language data to generate code snippets given natural language prompts.

Dependencies

Python 3.8 or higher
PyTorch 1.9.0 or higher
Transformers 4.28.0 or higher
tqdm 4.65.0 or higher

Code Explanation

The provided code consists of the following components:

Loading pretrained GPT2 model and tokenizer: The code loads the pretrained GPT2 model and tokenizer and adds special tokens for padding, start of sentence, and end of sentence.
Fine-tuned GPT2 model and tokenizer: The fine-tuned GPT2 model is created with additional special tokens and a larger vocabulary size to accommodate the new tokens.
CodeData Dataset: A custom PyTorch Dataset is defined to load code data from a JSON file, preprocess it, and tokenize it using the GPT2 tokenizer.
Training the model: The model is trained using the Adam optimizer on the CodeData dataset for a specified number of epochs.
Inference function: A function is defined to generate inferences given a prompt, model, and tokenizer. It takes parameters such as maximum length, number of return sequences, and temperature to control the output.
Generating inferences: The code generates example inferences using both the pretrained GPT2 model and the fine-tuned model, illustrating the difference in outputs.

How to Use

Install the required dependencies.
Prepare a JSON file containing the code data with the following format:

[
    {
        "instruction": "Instruction in natural language",
        "input": "Input description (optional)",
        "output": "Code snippet"
    },
    ...
]

Update the path to the JSON file in the CodeData class instantiation.
Run the script to train the model. The model's state will be saved after each epoch.
Use the generate_inference function to generate code snippets given natural language prompts.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
code_20k.json		code_20k.json
code_gpt.ipynb		code_gpt.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

code_20k.json

code_20k.json

code_gpt.ipynb

code_gpt.ipynb

Repository files navigation

Fine-tuned GPT2 Model on Code using HuggingFace Transformers

Dependencies

Code Explanation

How to Use

About

Releases

Packages

Languages

hmzakhalid/codegpt2

Folders and files

Latest commit

History

Repository files navigation

Fine-tuned GPT2 Model on Code using HuggingFace Transformers

Dependencies

Code Explanation

How to Use

About

Topics

Resources

Stars

Watchers

Forks

Languages