Astraios: Parameter-Efficient Instruction Tuning Code Language Models

This repository provides an overview of all components from the paper Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models.

Overview
PEFT
Evaluation
Training
Outputs
Visuals
Licenses
Citation

Overview

Data	CommitPackFT+OASST	Filtered version of CommitPack and OASST for high-quality commit messages that resemble instructions
Model	Astraios-1B	Collection of StarCoderBase-1B models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-3B	Collection of StarCoderBase-3B (3B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-7B	Collection of StarCoderBase-7B (7B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
	Astraios-16B	Collection of StarCoderBase-16B (16B parameters) models instruction tuned on CommitPackFT + OASST with different tuning methods
Evaluation	BigCloneBench	Dataset for clone detection; We use 2,000 samples for evaluation
	Devign	Dataset for defect detection; We use 2,000 samples for evaluation
	HumanEvalPack	Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages
	ReCode	Dataset for the robustness of code generation, covering 4 variants
	Asleep At The Keyboard	Datasets for security of code generation; We use DoW for evaluation

PEFT

Setup: Run the bash code below to set up the PEFT methods used in our work. We additionally implement AdapterH, AdapterP and Parallel methods based on the peft==0.6.0.dev0. For more information, please refer to the peft folder.

pip install git+https://github.com/bigcode-project/astraios#subdirectory=peft

Notes:

As Prefix Tuning does not work for StarCoder training, we do not evaluate this method.
For any configuration issues, please refer to the original PEFT.

Evaluation

Setup: Run the bash code below to set up the evaluation repository.

git clone -b astraios https://github.com/bigcode-project/bigcode-evaluation-harness
cd bigcode-evaluation-harness
pip install -q -r requirements.txt
accelerate config

Run: All evaluation scripts are in evaluation folder. Run each script via bash.

We use astraios-1b-lora as an example and use the bash code to run the following tasks:

Clone Detection

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_clone_detection_astraios-1b-lora.json \
--max_length_generation 512

Defect Detection

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_defect_detection_astraios-1b-lora.json \
--max_length_generation 512

HumanEvalSynthesize-Python

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalsynthesize-python \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 5 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalsynthesizepython_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalsynthesizepython_astraios-1b-lora.json \
--max_length_generation 2048

HumanEvalFix-Python

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalfixtests-python \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalfixpython_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalfixpython_astraios-1b-lora.json \
--max_length_generation 2048

HumanEvalExplain-Python

accelerate launch main.py \
--model bigcode/starcoderbase-1b \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalexplaindescribe-python \
--generation_only \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 5 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalexplaindescribe-python_astraios-1b-lora.json \
--max_length_generation 2048 

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks humanevalexplainsynthesize-python \
--do_sample True \
--temperature 0.2 \
--n_samples 1 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--load_data_path generations_humanevalexplainsynthesize-python_astraios-1b-lora.json \
--save_generations_path generations_humanevalexplainsynthesize-python_astraios-1b-lora.json \
--metric_output_path evaluation_humanevalexplainpython_astraios-1b-lora.json \
--max_length_generation 2048

ReCode-Format

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks perturbed-humaneval-format-num_seeds_5 \
--do_sample False \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--n_samples 1 \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_perturbed-humaneval-format-num_seeds_5_astraios-1b-lora.json \
--metric_output_path evaluation_perturbed-humaneval-format-num_seeds_5_astraios-1b-lora.json \
--max_length_generation 1024

AATK-DoW

accelerate launch main.py \
--model bigcode/starcoderbase-1b  \
--peft_model bigcode/astraios-1b-lora \
--tasks asleep_completion \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 1 \
--generation_only \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_asleep_completion_astraios-1b-lora.json \
--metric_output_path evaluation_asleep_completion_astraios-1b-lora.json \
--max_length_generation 1024

Note:

When evaluating FFT models, --peft_model should be removed and FFT model names need to pass with --model, e.g.:

accelerate launch main.py \
--model bigcode/astraios-1b-fft  \
--tasks clone_detection \
--do_sample False \
--batch_size 1 \
--save_generations \
--trust_remote_code \
--save_generations_path generations_clone_detection_astraios-1b-fft.json \
--max_length_generation 512

The evaluation notebook for Clone Detection and Defection is stored in evaluation/eval_code_comprehension.ipynb.

Training

PEFT

The finetuning python script is at finetune.py. Coresponding PEFT configurations are stored in peft_config.py. To train all models with PEFT, run the bash code:

sh run_peft.sh

Note:

--gradient_accumulation_steps 32 is for a single GPU. If the model is trained with 8 GPUs, gradient_accumulation_steps should be adjusted to 4.

FFT

To train all models with FFT, run the bash code:

sh run_fft.sh

Note:

--gradient_accumulation_steps 32 is for a single GPU. If the model is trained with 8 GPUs, gradient_accumulation_steps should be adjusted to 4.

Outputs

Outputs are under their corresponding subfolders in outputs folder.

Note:

Some outputs may be missing as they were not saved initially.

Visuals

Figures:

All figures are created via this colab notebook.
The figure of Astraios is generated via DALL-E with the prompt of A fantasy-inspired, adorable illustration of Astraios, the Greek god of dusk. The setting is a serene evening landscape with a gradient sky transition.

Licenses

Everything is licensed as permissively as possible to us.

The evaluation repository Code Generation LM Evaluation Harness is licensed under the Apache-2.0 license.

PEFT library is licensed under Apache-2.0 license.

All Astraios models are licensed under the same license as StarCoder (Commercial except for use cases deemed harmful).

The remaining code originally created in this repository is licensed under the MIT License.

Todo List

Organize the file names in outputs/ folder.
Organize the bash script in evaluation/ folder.
Merge the PR to bigcode-evaluation-harness.

Citation

@article{zhuo2024astraios,
      title={Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models}, 
      author={Terry Yue Zhuo and Armel Zebaze and Nitchakarn Suppattarachai and Leandro von Werra and Harm de Vries and Qian Liu and Niklas Muennighoff},
      journal={https://arxiv.org/abs/2401.00788},
      year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
evaluation		evaluation
outputs		outputs
peft		peft
training		training
visuals		visuals
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

outputs

outputs

peft

peft

training

training

visuals

visuals

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Astraios: Parameter-Efficient Instruction Tuning Code Language Models

Overview

PEFT

Evaluation

Training

PEFT

FFT

Outputs

Visuals

Licenses

Todo List

Citation

About

Releases

Packages

Contributors 3

Languages

License

bigcode-project/astraios

Folders and files

Latest commit

History

Repository files navigation

Astraios: Parameter-Efficient Instruction Tuning Code Language Models

Overview

PEFT

Evaluation

Training

PEFT

FFT

Outputs

Visuals

Licenses

Todo List

Citation

About

Resources

License

Stars

Watchers

Forks

Languages