Skip to content

Latest commit

 

History

History

robustness_of_text_to_sql

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Robustness of Text-to-SQL Models

This repository contains the data and code in the following paper:

Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation
Xinyu Pi*, Bing Wang*, Yan Gao, Jiaqi Guo, Zhoujun Li, Jian-Guang Lou
ACL 2022 Long Papers

Introduction

This repository is the official implementation of our paper Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation. In this paper, we curate ADVETA, the first robustness evaluation benchmark featuring natural and realistic adversarial table perturbation. To defend against this perturbation, we build a systematic adversarial training example generation framework CTA, tailored for better contextualization of tabular data.

ADVETA

We manually curate the ADVErsarial Table perturbAtion (ADVETA) benchmark based on three mainstream Text-to-SQL datasets, Spider, WikiSQL and WTQ. For each table from the original development set, we conduct RPL/ADD annotation separately, perturbing only table columns. We release our data in adveta_1.0.zip file.

CTA

Requirement

  • python: 3.8
  • cuda: 10.1
  • torch: 1.7.1

install dependencies:

conda create -n cta python=3.8  -y
conda activate cta
conda install pytorch==1.7.1  cudatoolkit=10.1 -c pytorch -y
python -m spacy download en_core_web_sm
pip install -r requirements.txt

Introduction

Contextualized Table Augmentation (CTA) framework as an adversarial training example generation approach tailored for tabular data. Before you run pipeline.ipynb, you should download data files and checkpoints from Google Drive.

notes:

  • We download number-batch word embedding from here as ./data/nb_emb.txt.
  • We pre-compute processed-WDC tables using Tapas dense retrieval models. Store output to ./wdc/wdc_dense_A.txt and ./wdc/wdc_dense_B.txt (Tapas have two encoders).

Run

Just run the pipeline.ipynb and have fun.

Cite

@inproceedings{pi-etal-2022-towards,
    title = "Towards Robustness of Text-to-{SQL} Models Against Natural and Realistic Adversarial Table Perturbation",
    author = "Pi, Xinyu  and Wang, Bing  and Gao, Yan  and Guo, Jiaqi  and Li, Zhoujun  and Lou, Jian-Guang",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.142",
    pages = "2007--2022"
}