PyAutoCode: GPT-2 based Python auto-code.

PyAutoCode is a cut-down python autosuggestion built on GPT-2 (motivation: GPyT) model. This baby model (trained only up to 3 epochs) is not "fine-tuned" yet therefore, I highly recommend not to use it in a production environment or incorporate PyAutoCode in any of your projects. It has been trained on 112GB of Python data sourced from the best crowdsource platform ever -- GitHub.

NOTE: Increased training and fine tuning would be highly appreciated and I firmly believe that it would improve the ability of PyAutoCode significantly.

The model is published on huggingface. You may go ahead and check it out!

Some Model Features

Built on GPT-2
Tokenized with ByteLevelBPETokenizer
Data Sourced from GitHub (almost 5 consecutive days of latest Python repositories)
Makes use of GPTLMHeadModel and DataCollatorForLanguageModelling for training

Navigating this Code Repository

If you want to directly run the model, then you can go ahead and execute app.py. But, be warned that the requirements of this program is not attached with the repo since requirements may vary based on the system GPUs and architecture.

If you want to understand how this code is functioning, you can take a look at the PyAutoCode > templates folder for a descriptive explanation of each module / concept used.

Get a Glimpse of the Model

You can make use of the Inference API of huggingface (present on the right sidebar) to load the model and check the result.

Usage

You can use this model too! Here's a quick tour of how you can achieve this:

Install transformers

$ pip install transformers

Call the API and get it to work!

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("P0intMaN/PyAutoCode")

model = AutoModelForCausalLM.from_pretrained("P0intMaN/PyAutoCode")

# input: single line or multi-line. Highly recommended to use doc-strings.
inp = """import pandas"""

format_inp = inp.replace('\n', "<N>")
tokenize_inp = tokenizer.encode(format_inp, return_tensors='pt')
result = model.generate(tokenize_inp)

decode_result = tokenizer.decode(result[0])
format_result = decode_result.replace('<N>', "\n")

# printing the result
print(format_result)

Upon successful execution, the above should probably produce (your results may vary when this model is fine-tuned)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
PyAutoCode		PyAutoCode
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyAutoCode

PyAutoCode

CONTRIBUTING.md

CONTRIBUTING.md

README.md

README.md

app.py

app.py

Repository files navigation

PyAutoCode: GPT-2 based Python auto-code.

Some Model Features

Navigating this Code Repository

Get a Glimpse of the Model

Usage

About

Releases

Packages

Contributors 3

Languages

P0intMaN/PyAutoCode

Folders and files

Latest commit

History

Repository files navigation

PyAutoCode: GPT-2 based Python auto-code.

Some Model Features

Navigating this Code Repository

Get a Glimpse of the Model

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages