Skip to content

maastrichtlawtech/awesome-legal-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Awesome License

Legal Natural Language Processing

๐Ÿ—‚ Datasets

Legal Judgement Prediction (LJP)

Dataset Links Domain Language Size
FSCS (Niklaus et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Swiss court judgments ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น 85K cases w/ 2 outcomes
ECtHR (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿค— EU court judgments ๐Ÿ‡ฌ๐Ÿ‡ง 11K cases w/ 11 outcomes
ECHR (Aletras et al., 2019) ๐Ÿ“„ ๐Ÿ’พ EU court judgments ๐Ÿ‡ฌ๐Ÿ‡ง 11.5K cases w/ 11 outcomes
CAIL (Xiao et al., 2018) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 2.6M cases w/ 6 outcomes

Legal Text Classification (LTC)

Dataset Links Domain Language Size
GLC (Papaloukas et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Greek legislation ๐Ÿ‡ฌ๐Ÿ‡ท 47.5K laws w/ 2.7K labels
CUAD (Hendrycks et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 510 contracts w/ 41 classes
MultiEURLEX (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป EU legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ช๐Ÿ‡ธ (18+) 65K laws w/ 4.5K labels
LEDGAR (Tuggener et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 60.5K contracts w/ 12.6K labels
Contract Discovery (Borchmann et al., 2020) ๐Ÿ“„ ๐Ÿ’ป Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 2.6K clauses w/ 21 classes
EURLEX-57K (Chalkidis et al., 2019) ๐Ÿ“„ ๐Ÿ’พ EU legislation ๐Ÿ‡ฌ๐Ÿ‡ง 57K laws w/ 4.3K labels
Unfair-ToS (Lippi et al., 2018) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 9.4K sentences w/ 9 classes
Contract Elements (Chalkidis et al., 2017) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 2.4K contracts w/ 10 classes
OPP-115 (Wilson et al., 2016) ๐Ÿ“„ ๐Ÿ’พ Privacy laws ๐Ÿ‡ฌ๐Ÿ‡ง 115 policies w/ 23K labels

Legal Information Retrieval (LIR)

Dataset Links Domain Language Size
BSARD (Louis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Belgian legislation ๐Ÿ‡ซ๐Ÿ‡ท 1.1K questions w/ 22.6K candidate statutory articles
EU2UK (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿ’พ EU & UK legislation ๐Ÿ‡ฌ๐Ÿ‡ง 2K query documents w/ 52.5K candidate documents
UK2EU (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿ’พ EU & UK legislation ๐Ÿ‡ฌ๐Ÿ‡ง 2.1K query documents w/ 3.9K candidate documents
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Canadian precedents ๐Ÿ‡ฌ๐Ÿ‡ง 650 query cases w/ 128K candidate cases
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Japanese legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต 808 questions w/ 768 candidate statutory articles
CAIL2019-SCM (Xiao et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 8.9K triplets of cases

Legal Question Answering (LQA)

Dataset Links Domain Language Size
CaseHOLD (Zheng et al., 2021) ๐Ÿ“„ ๐Ÿ’ป US case holdings ๐Ÿ‡ฌ๐Ÿ‡ง 53.1K multiple-choice questions
JEC-QA (Zhong et al., 2019) ๐Ÿ“„ ๐Ÿ’พ Chinese law ๐Ÿ‡จ๐Ÿ‡ณ 26.3K multiple-choice questions
CJRC (Duan et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 50K question-answers from 10K documents
PrivacyQA (Ravichander et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Privacy policies ๐Ÿ‡ฌ๐Ÿ‡ง 1.7K question-answers from 35 documents

Legal Textual Entailment (LTE)

Dataset Links Domain Language Size
COLIEE-Case-Law-Entailment (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Canadian precedents ๐Ÿ‡ฌ๐Ÿ‡ง 425 cases w/ related case
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Japanese legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต 808 questions w/ related statutory article

Legal Text Summarization (LTS)

Dataset Links Domain Language Size
UK-Abs (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ UK court cases ๐Ÿ‡ฌ๐Ÿ‡ง 793 pairs of (case, abastractive summary) from the UK Supreme Court
IN-Abs (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ Indian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court
IN-Ext (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ Indian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 50 pairs of (case, extractive summary) from the Indian Supreme Court
TOS;DR (Keymanesh et al., 2020) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 1.6K pairs of (agreement text, summary) from data privacy policies
BillSum (Kornilova et al., 2019) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ US Congressional bills ๐Ÿ‡ฌ๐Ÿ‡ง 22.2K pairs of (bill, summary)
TL;DRLegal (Manor et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 84 pairs of (agreement text, summary) from software licenses
TOS;DR (Manor et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 421 pairs of (agreement text, summary) from data privacy policies
BVA Cases (Zhong et al., 2019) ๐Ÿ“„ ๐Ÿ’ป US court cases ๐Ÿ‡ฌ๐Ÿ‡ง 92 pairs of (case, summary) from the US Board of Veterans' Appeal
LCR (Galgani et al., 2012) ๐Ÿ“„ ๐Ÿ’พ Australian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 3.9K pairs of (case, catchphrases)

Legal Language Modeling (LLM)

Dataset Links Language Size
Pile of Law (Henderson et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง ~256GB of legal and administrative legal text

Benchmarks

Dataset Task Language Tasks
FairLex (Chalkidis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡จ๐Ÿ‡ณ Clasification (x1), legal judgement prediction (x3)
LexGLUE (Chalkidis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง Classsification (x6), multiple-choice QA (x1)

๐Ÿ”ฅ Models

Model Links Language Size
Legal-HeBERT (Chriqui et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฎ๐Ÿ‡ฑ 110M
PoL-BERT-Large (Henderson et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง 336M
Italian-LEGAL-BERT (Licari and Comande, 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ‡ฎ๐Ÿ‡น 110M
JuriBERT (Douka et al., 2021) ๐Ÿ“„ ๐Ÿ’พ ๐Ÿ‡ซ๐Ÿ‡ท {6M, 15M, 42M, 110M}
Custom-LEGAL-BERT (Zheng et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง 110M
LEGAL-BERT (Chalkidis et al., 2020) ๐Ÿ“„ ๐Ÿค— ๐Ÿ‡ฌ๐Ÿ‡ง {35M, 110M}
LEGAL-GPT-{1,2} (Borchmann et al., 2020) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง {117M, 1.5B}

๐Ÿ“š Books

  • [2017] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link]

๐Ÿ“„ Surveys

  • [2020-05] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf]
  • [2019-09] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf]
  • [2018-12] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf]

๐ŸŽ™ Talks

  • [2019-06] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides]
  • [2019-04] Artificial Intelligence and Law โ€“ An Overview and History, H. Surden. [video]

๐Ÿ—“ Conferences & Workshops

  • The Natural Legal Language Processing (NLLP) Workshop [website]
  • The International Conference on Artificial Intelligence and Law (ICAIL) [website]
  • The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
  • The EXplainable AI in Law (XAILA) Workshop [website]
  • The International Workshop on Juris-informatics (JURISIN) [website]
  • The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
  • The International Workshop on Legal Information Retrieval [website]

About

๐Ÿ“– A curated list of LegalNLP resources from all around the web.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks