Source code for the paper, "Investigating Strategies for Clause Recommendation" published at JURIX 2022
You may find the publication here.
You'll find the scripts to run inside the ./scripts folder, and code for different components in other folders. Follow this order to prepare the workable subset from the LEDGAR corpus (download the LEDGAR dataset from here.) We'll be making use of the cleaned corpus.
bash ./scripts/run_prepare_clauserec_dataset.sh
bash ./scripts/run_bert_pretrain.sh
bash ./scripts/run_create_clauserec_emb_dataset.sh
bash ./scripts/run_create_label_embeddings_dataset.sh
bash ./scripts/run_create_ids_file.sh
bash ./scripts/run_train_tokenizer.sh
bash ./scripts/run_train_clause_decoder.sh
bash ./scripts/run_metrics_calculator.sh
If you're using this, please cite the work as: Joshi, Sagar, et al. "Investigating Strategies for Clause Recommendation." Legal Knowledge and Information Systems. IOS Press, 2022. 73-82.