detecting-incongruity-dataset-gen

English dataset generation code for detecting-incongruity based on NELA2017 Dataset

Generation method from following paper: Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder, AAAI-19, paper

$python3 dataset_creation.py --nela_path [PATH_TO_UNZIPPED_NELA_FOLDER] --output_dir ./output/

Read arbitrary article file
- Article file must be csv formatted with headline and body at each row without header.

$python3 dataset_creation.py --input_path sample.csv --output_dir ./output/

Output will be 5 csv files (train.csv, train_ip.csv, dev.csv, dev_ip.csv, test.csv).
- Training / Validation Set with _ip postfix is for IP method in paper.
Each csv files contains rows with 4 columns (Index, Headline, Body, Label) without header.
- Filename without _ip postfix contains one unique article per row.
- Filename with _ip postfix contains one unique paragraph article per row.
- Label "1" means True - this row is incongruent.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
dataset_creation.py		dataset_creation.py
sample.csv		sample.csv

Provide feedback