Skip to content

ofersabo/Few_Shot_transformation_and_sampling

Repository files navigation

Few-Shot Transformation and Sampling

In this repository we share the script to transform any supervised dataset into a Few-Shot dataset.

Further, this repository contains a script to generate episodes as described in the paper.

To transform TACRED into our suggested Few-Shot TACRED please use this commad

Getting access to TACRED dataset

https://nlp.stanford.edu/projects/tacred/#access

Convert TACRED to a list of instances per relation, you can use this script, on each of these train dev test dataset.

These three commands convert each of TACRED train/dev/test into a list of instances per relation type.

python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/train.json --output_file TACRED_raw_data/instances_per_relation/TACRED_train.json
python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/dev.json --output_file TACRED_raw_data/instances_per_relation/TACRED_dev.json
python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/test.json --output_file TACRED_raw_data/instances_per_relation/TACRED_test.json
Convert these data partitions into Few-Shot dataset, in which the classes across partitions are disjoint.

This command utilizes our method of transforming supervised dataset into Few-Shot Learning dataset on TACRED.

python data_transformation.py --train_data TACRED_raw_data/instances_per_relation/TACRED_train.json --dev_data TACRED_raw_data/instances_per_relation/TACRED_dev.json --test_data TACRED_raw_data/instances_per_relation/TACRED_test.json --fixed_categories_split categories_split.json --test_size 10 --output_dir ./data_few_shot

voila, the new Few-Shot TACRED dataset, divided into train dev and test datasets.

To generate episodes for Few-Shot TACRED with respect to data distribution

python episodes_sampling_for_fs_TACRED.py --file_name [train/dev/test] --episodes_size [episodes_size] --N [N_way] --K [K_shot] --number_of_queries [number_of_test_instances] --seed [123] --output_file_name [output_file_name]
Generating Few-Shot TACRED test episodes

To create the test episodes benchmark, use this shell script: Creating five files of episodes with seed ranging from 160290 to 160294

Here is the shell command:

./create_test_episodes.sh

For each test episode file we generated an id file which composed of the episodes ids, you can use these files to verify that your generated test episodes are identical to our test episodes benchmark. These files are stored under the ids_of_episodes directory

Downsampling the NOTA category in the training set, as we found it to be advantageous.

Here is the command that generates the same downsampled training dataset as we used. If you choose to downsample the training data, apply this downsampling before generating episodes.

python downsample_train_data.py --dataset data_few_shot/_train_data.json --output_file data_few_shot/new_downsampled_train_data.json   

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published