Skip to content

Latest commit

 

History

History
76 lines (54 loc) · 2.97 KB

README.md

File metadata and controls

76 lines (54 loc) · 2.97 KB

About

This repository contains the resources used for SIGIR'2024 submission "Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange"

Quick Start

Answer generation

Generate answers to the questions in the Arqmath3 competition dataset with the following call:

mkdir data
python3 code/genArqmathAnswers.py --llm tora-7b

(--llm tora-7b can be modified to any of tora-7b, tora-13b, llemma, mammoth or mistral.)

Answers are saved as ./topics-and-qrels/{llm}/topics.arqmath-2022-{llm}-origin-and-generated-answers-0.csv

To produce runs, we refer to the https://github.com/approach0/pya0/tree/mabowdor repository. You need to perform a few additions:

cp -r topics-and-qrels path-to/pya0/
cp -r code/pya0-replace/* path-to/pya0/

Obtain runs via

cd path-to/pya0/utils/
python3 -m transformer_eval search path-to/pya0/utils/training-and-inference/inference.ini search__tora_7b_generated_single_vec \
--backbone=cocomae --ckpt=220 --use_prebuilt_index=arqmath-task1-dpr-cocomae-220

Then evaluate:

../eval-arqmath3/task1/preprocess.sh cleanup
../eval-arqmath3/task1/preprocess.sh ./runs/arqmath3-cocomae-220-hnsw-top1000.run
../eval-arqmath3/task1/eval.sh

Generating Embeddings

Build an FAISS index of embeddings for the questions in the Arqmath3 competition dataset, we first have to download the complete set of posts. It can be obtained here:

https://drive.google.com/file/d/14SSwTqLZgLVP6iDsAJbmxgb01a8NYyDb/view

Now build the embeddings with the following call:

python3 code/genArqmathEmbeddings.py --index --llm tora-7b \
--device gpu --query_limit 100 --rank_limit 10 --outdir embeddings_data --corpus Posts.V1.3.xml \ 
--runfile topics-and-qrels/mergerun--0.4W_arqmath3_a0.run--0.2W_arqmath3-SPLADE-nomath-cocomae-2-2-0-top1000.run--0.4W_arqmath3-cocomae-220-top1000.run

(--llm tora-7b can be modified to any of tora-7b, tora-13b, llemma, mammoth or mistral. --rank_limit 10 replicates our setting of top 10 retrieved documents reranked.)

Produce a run from said index by calling:

python3 code/genArqmathEmbeddings.py --seaarch --llm tora-7b \
--topk 10 --device gpu --query_limit 100 --outdir embeddings_data

Indices are found in the folders ./embeddings-data/{llm}/index.

Runfiles are saved in ./runs/{llm}_arqmath3_rerank.run.

They can be evaluated within the /pya0 module, by

cp ./runs/ path-to/pya0/training-and-inference/runs/
../eval-arqmath3/task1/preprocess.sh cleanup
../eval-arqmath3/task1/preprocess.sh ./runs/arqmath3-cocomae-220-hnsw-top1000.run
../eval-arqmath3/task1/eval.sh

Reference

A. Satpute, N. Giessing, A. Greiner-Petter, M. Schubotz, O. Teschke, A. Aizawa, and B. Gipp, “Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange,” in Proceedings of 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), Washington, USA, 2024.