REFACTOR: Learning to Extract Theorems from Proofs

This is the official repo for

REFACTOR: Learning to Extract Theorems from Proofs (ICLR 2024)

by Jin Peng Zhou*, Yuhuai Wu*, Qiyang Li, and Roger Grosse
*Equal Contribution

Abstract

Human mathematicians are often good at recognizing modular and reusable theorems that make complex mathematical results within reach. In this paper, we propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for training neural networks to mimic this ability in formal mathematical theorem proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6% of the theorems that humans would use to write the proofs. When applying the model to the existing Metamath library, REFACTOR extracted 16 new theorems. With newly extracted theorems, we show that the existing proofs in the MetaMath database can be refactored. The new theorems are used very frequently after refactoring, with an average usage of 733.5 times, and help shorten the proof lengths. Lastly, we demonstrate that the prover trained on the new-theorem refactored dataset proves more test theorems and outperforms state-of-the-art baselines by frequently leveraging a diverse set of newly extracted theorems.

Theorem Expansion

To expand theorems in set.mm or propositional.mm:

python theorem_expansion.py

Various data splitting and capping options can be specified in the theorem_expansion.py file.

Theorem Extraction Model Training

python train.py

Detailed hyperparameter options can be found in the paper and train.py file.

Theorem Verification

To extract and verify new theorems predicted by REFACTOR model

python theorem_verification.py

The main extraction and verification code is in the function analyze_predictions. We also perform standardization and deduplication of the extracted theorems. The code will output an augmented set.mm file with the new theorems and their proofs.

Theorem Refactoring

Using the augmented set.mm file we can refactor the existing proofs in the Metamath database.

python theorem_refactor.py

One example of refactored augmented theorem file is in refactor_dataset/augmented_set_refactored.mm.

Theorem Proving with Refactored Theorems

We use MetaGen's implementation of the Holophrasm and train a prover on our refactored theorems. For more details, please see this repo. We provide the proved theorems in holophrasm_proved_theorems/.

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{zhou2023refactor,
  title={REFACTOR: Learning to Extract Theorems from Proofs},
  author={Zhou, Jin Peng and Wu, Yuhuai and Li, Qiyang and Grosse, Roger Baker},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
holophrasm_proved_theorems		holophrasm_proved_theorems
raw_dataset		raw_dataset
refactor_dataset		refactor_dataset
README.md		README.md
create_environment.sh		create_environment.sh
custom_sampler.py		custom_sampler.py
data.py		data.py
model.py		model.py
model_names.py		model_names.py
theorem_expansion.py		theorem_expansion.py
theorem_refactor.py		theorem_refactor.py
theorem_verification.py		theorem_verification.py
train.py		train.py
utils.py		utils.py

jinpz/refactor

Folders and files

Latest commit

History

Repository files navigation

REFACTOR: Learning to Extract Theorems from Proofs

Abstract

Theorem Expansion

Theorem Extraction Model Training

Theorem Verification

Theorem Refactoring

Theorem Proving with Refactored Theorems

Citation

About

Resources

Stars

Watchers

Forks

Languages