Skip to content

AtmaHou/PromptSlotTagging

Repository files navigation

Prompt Slot Tagging

This is the code of the Findings of ACL 2022 paper: Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging.

Get Started

Requirement

python >= 3.6.13
torch >= 1.10.2
transformers >= 4.10.2

Step1: Prepare data and scripts

  • Download prompted few-shot data at download data.
  • You can also generate data using original_data and utils:
    • train data for snips using preprocessor.py,other data (dev and test data) using rechecker_preprocessor.py
    • change the path in __main__
  • few-shot original data example:
{
  "domain_name": [
    {  // episode
      "support": {  // support set
        "seq_ins": [["we", "are", "friends", "."], ["how", "are", "you", "?"]],  // input sequence
        "seq_outs": [["O", "O", "O", "O"], ["O", "O", "O", "O"]]  // output sequence in sequence labeling task
      },
      "query": {  // query set
        "seq_ins": [["we", "are", "friends", "."], ["how", "are", "you", "?"]],
        "seq_outs": [["O", "O", "O", "O"], ["O", "O", "O", "O"]]
      }
    },
    ...
  ],
  ...
}
  • few-shot prompted data example:
{
  "domain_name": [
    {  // episode
      "domain": "domain_name"
      "support": {  // support set
        "original_seq_in": [["Jack", "is", "my", "friend", "."], ["how", "are", "you", "?"]],  // input sequence
        "original_seq_out": [["B-name", "O", "O", "O", "O"], ["O", "O", "O", "O"]],  // output sequence in sequence labeling task
	"prompt_seq_in": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to"]...],
	"prompt_seq_out": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to", "Jack"]...],
	"checker_prompt_in": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to", "Jack", ".", "loaction", "refers", "to"]...],
	"checker_prompt_out": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to", "Jack", ".", "loaction", "refers", "to", "none"]...]
      },
      "query": {  // query set
      	"domain": "domain_name",
        "original_seq_in": [["Jack", "is", "my", "friend", "."], ["how", "are", "you", "?"]],  // input sequence
        "original_seq_out": [["O", "O", "O", "O"], ["O", "O", "O", "O"]],  // output sequence in sequence labeling task
	"prompt_seq_in": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to"]...],
	"prompt_seq_out": [["'", "Jack", "is", "my", "friend", ".", "'", "name", "refers", "to", "Jack"]...]
      }
    },
    ...
  ],
  ...
}
  • For MIT data(In-domain): For example, ./prompt_data/MIT_M/prompt_MIT_M/mit_m.10_shot.json is the path for prompted 10-shot MIT_movie data.

    • Then you need to:
      • set test_path in ./scripts/mit/mit_m.sh as ./prompt_data/MIT_M/prompt_MIT_M/ (the / at the end is needed).
      • set test_file in ./scripts/mit/mit_m.sh as mit_m.10_shot.
      • set data_set in ./scripts/mit/mit_m.sh as mit
      • mkdir pred
      • mkdir model_selection
  • For SNIPS data(Meta-Source Transfer): For example, ./prompt_data/snips/prompt_snips/snips_train_2.json, ./prompt_data/snips/prompt_snips/snips_dev_2.json and ./prompt_data/snips/prompt_snips/snips_test_2.json is the train, dev and test path for the 2nd of 7 domain of prompted 1-shot SNIPS data. (the last number of the file name is the domain number).

    • Then you need to:
      • set train_path, dev_path and test_path in ./scripts/snips/snips.sh as ./prompt_data/snips/prompt_snips/ (the / at the end is needed).
      • set train_file in ./scripts/snips/snips.sh as snips_train_2.
      • set dev_file in ./scripts/snips/snips.sh as snips_dev_2.
      • set test_file in ./scripts/snips/snips.sh as snips_test_2.
      • set data_set in ./scripts/snips/snips.sh as snips
      • mkdir pred
      • mkdir model_selection
      • mkdir model
      • mkdir ft_model

Step2: Train and test the main model

Example for MIT:
source ./scripts/mit/mit_m.sh
source ./scripts/mit/mit_mm.sh
source ./scripts/mit/mit_r.sh
Example for SNIPS:
source ./scripts/snips/snips.sh

Note: Each time re-run a setting you need to clear the corresponding pred path manaully.

Project Architecture

  • PromptSlotTagging:
    • original_data: original data to construct prompt data
    • scripts: running scripts.
      • mit: running scripts for mit (In-domain).
        • mit_m.sh: running script for MIT-movie
        • mit_mm.sh: running script for MIT-movie-hard
        • mit_r.sh: running script for MIT-restaurant
      • snips: running scripts for SNIPS (Meta-Source Transfer).
        • snips.sh: running script for SNIPS
    • utils: tools to construct prompt data
    • train.py: the entry file of the whole project.
    • opt.py: the definition and default settings of the arguments.
    • model.py: modified GPT2 modeling for our generation.
    • dataloader.py: the definition of dataloader, which transfrom raw prompted data into model inputs.
    • eval.py: the eval function based on the outputs of the model.
    • conlleval.pl: the conlleval scripts from https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt.

License for code and data

Apache License 2.0

About

Code for ACL22 findings paper: Inverse is Better! Fast and Accurate Prompt for Slot Tagging

Resources

Stars

Watchers

Forks

Packages

No packages published