MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data (ICLR 2024 Spotlight) [PDF] [CHALLENGE]

MUSTARD is a data synthesis framework providing theorem and proof data with informal theorem, informal proof, formal theorem (in Lean), and formal proof (in Lean). The project currently supports Lean 3.

MUSTARD benefits Math Word Problems (MWP).

MUSTARD benefits Automated Theorem Proving (ATP).

MUSTARD is effective with Llama 2-70b.

🍯 Preparation

Requirements

python 3.9
elan
lean v3.51.1
lean server from leaven
python packages in requirements.txt

Run the following command to install lean v3.51.1 once you have successully installed elan:

elan override set leanprover-community/lean:3.51.1

You can check elan and lean version by:

elan -V # elan 3.0.0 (cdb40bff5 2023-09-08) in my environment
lean -v # Lean (version 3.51.1, commit cce7990ea86a, Release) in my environment

Please note that requirements.txt will install PyTorch 2.0.1 version, in case you want to run FSDP + PEFT, please make sure to install PyTorch nightlies.

Then, please download the _target folder here and put it under ./. The _target folder contains Lean mathlib files and the corresponding compiled files.

OpenAI API Key

For using OpenAI model as a backbone in MUSTARD, please fill in openai_key.py with proper model name, key, and org. For example,

LLM_API_KEY = {
    "model": "gpt-4",
    "key": "sk-xxx",  
    "org": "org-xxx"
}

🍯 Data Synthesis

Please first fill in params_custom in params.py as follows:

params_custom = MustardArgs(
    baseline_type=$BASELINE_TYPE, 
    n_iter=$NUMBER_OF_DATA_TO_SYNTHESIZE, 
    qtype=$QUESTION_TYPE, 
    qlevel=$QUESTION_LEVEL, 
    kw_mode=$KW_MODE,
    num_keyword=$NUMBER_OF_KEYWORDS, 
    num_correct=$NUMBER_OF_CORRECTION, 
)

If you want to preset your math concepts, please additionally assign [($your_1st_concept, $your_1st_domain), ($your_2nd_concept, $your_2nd_domain), ...] to preset_keywords in MustardArgs. Then MUSTARD will generate $NUMBER_OF_DATA_TO_SYNTHESIZE samples with the preset concepts.

The pipeline is tested in the following domains:

Domain	Parameter
$BASELINE_TYPE	`all`, `step`
$KW_MODE	`kw`: [concept] only, `kwg`: in the format of "[concept] in [domain]"
$QUESTION_TYPE	`word_problem`, `theorem_proving`
$QUESTION_LEVEL	`higher_edu`, `high_school`, `middle_school` `elementary_school`
$NUMBER_OF_KEYWORDS	`1`, `2`
$NUMBER_OF_CORRECTION	`0`, `1`, `2`

Once the parameters are set, run:

python generate.py

🍯 MUSTARDSAUCE dataset

Please download the MUSTARDSAUCE dataset HERE.

🍯 Fine-tuning & Inference

The fine-tuning and inference code can be found in ./downstream.

Fine-tuning

To initiate the MUSTARD fine-tuning process, ensure that the model/data paths in the script files and configuration files are replaced with your own paths. Execute the run.sh script to begin.

For a more detailed guide on fine-tuning Llama-2, refer to the facebookresearch/llama-recipe repository.

Inference

We also provide inference scripts for Math Word Problem and Automated Theorem Proving tasks. Make sure to replace the model/data paths in the script files and configuration files with your own paths. Execute the inference/infer.sh script to run the inference process.

💡 Citation

If you find this work helpful, please consider citing:

@inproceedings{
huang2024mustard,
title={{MUSTARD}: Mastering Uniform Synthesis of Theorem and Proof Data},
author={Yinya Huang and Xiaohan Lin and Zhengying Liu and Qingxing Cao and Huajian Xin and Haiming Wang and Zhenguo Li and Linqi Song and Xiaodan Liang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=8xliOUg9EW}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
downstream		downstream
images		images
leaven		leaven
.DS_Store		.DS_Store
README.md		README.md
generate.py		generate.py
leanpkg.path		leanpkg.path
leanpkg.toml		leanpkg.toml
openai_key.py		openai_key.py
params.py		params.py
requirements.txt		requirements.txt
util_generate.py		util_generate.py
util_parsing.py		util_parsing.py

Eleanor-H/MUSTARD

Folders and files

Latest commit

History

Repository files navigation

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data (ICLR 2024 Spotlight) [PDF] [CHALLENGE]

MUSTARD benefits Math Word Problems (MWP).

MUSTARD benefits Automated Theorem Proving (ATP).

MUSTARD is effective with Llama 2-70b.

🍯 Preparation

Requirements

OpenAI API Key

🍯 Data Synthesis

🍯 MUSTARDSAUCE dataset

🍯 Fine-tuning & Inference

Fine-tuning

Inference

💡 Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages