Investing Strategies for Clause Recommendation

Source code for the paper, "Investigating Strategies for Clause Recommendation" published at JURIX 2022
You may find the publication here.

Reproducing the results

You'll find the scripts to run inside the ./scripts folder, and code for different components in other folders. Follow this order to prepare the workable subset from the LEDGAR corpus (download the LEDGAR dataset from here.) We'll be making use of the cleaned corpus.

Preparing the subset with contract mapping

bash ./scripts/run_prepare_clauserec_dataset.sh

Further pretrain a BERT or BERT-based model on the dataset

bash ./scripts/run_bert_pretrain.sh

Preparing the embedding-serialized datasets

bash ./scripts/run_create_clauserec_emb_dataset.sh
bash ./scripts/run_create_label_embeddings_dataset.sh

Preparing for training: create id files with train/dev/test splits, train a tokenizer

bash ./scripts/run_create_ids_file.sh
bash ./scripts/run_train_tokenizer.sh

Train for clause recommendation corresponding to a strategy

bash ./scripts/run_train_clause_decoder.sh

Evaluate the best training checkpoint using metrics

bash ./scripts/run_metrics_calculator.sh

Citation

If you're using this, please cite the work as: Joshi, Sagar, et al. "Investigating Strategies for Clause Recommendation." Legal Knowledge and Information Systems. IOS Press, 2022. 73-82.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
eval		eval
model		model
scripts		scripts
train		train
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

eval

eval

model

model

scripts

scripts

train

train

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Investing Strategies for Clause Recommendation

Reproducing the results

Preparing the subset with contract mapping

Further pretrain a BERT or BERT-based model on the dataset

Preparing the embedding-serialized datasets

Preparing for training: create id files with train/dev/test splits, train a tokenizer

Train for clause recommendation corresponding to a strategy

Evaluate the best training checkpoint using metrics

Citation

About

Releases

Packages

Languages

License

sagarsj42/strategies_for_clause_recommendation

Folders and files

Latest commit

History

Repository files navigation

Investing Strategies for Clause Recommendation

Reproducing the results

Preparing the subset with contract mapping

Further pretrain a BERT or BERT-based model on the dataset

Preparing the embedding-serialized datasets

Preparing for training: create id files with train/dev/test splits, train a tokenizer

Train for clause recommendation corresponding to a strategy

Evaluate the best training checkpoint using metrics

Citation

About

Resources

License

Stars

Watchers

Forks

Languages