CANarEx pipeline

Runs on Linux and macOS using Python 3.9.5

Factiva and Hansard 'First Nations' dataset

CaNarEx environment

 cd CaNarEx
 python3 -m venv venv_canarex
 source venv_canarex/bin/activate
 pip install -r requirements.txt

Use CaNarEx environment
Run split_sentences_trf.py (data already provided)
```
    python 1.split_sentences_trf.py
```

Using SpanBERT

Download https://github.com/mandarjoshi90/coref and follow installation instructions from "Jonathan K. Kummerfeld's notebook" ('spanbert_base') into coref_env environment

Install following packages into coref_env:

    pip install tokenization
    pip install sacremoses

 python 4.clustering.py

The evaluation folder contains generation of synthetic test data for narrative time-series clustering using jupyter notebook.

Environment: Follow setup steps from relatio: https://github.com/relatio-nlp/relatio
Relatio folder provided: changed to add document ids to output generated.

    python 5.run_relatio.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
clustering		clustering
data		data
evaluation		evaluation
narratives		narratives
relatio		relatio
.gitignore		.gitignore
1.split_sentences_trf.py		1.split_sentences_trf.py
2.coref_bert.py		2.coref_bert.py
3.run_canarex.py		3.run_canarex.py
4.clustering.py		4.clustering.py
LICENSE		LICENSE
README.md		README.md
bert-base-uncased-vocab.txt		bert-base-uncased-vocab.txt
requirements.txt		requirements.txt
run.ipynb		run.ipynb