Skip to content

awslabs/hypergraph-tabular-lm

HyTrel

A hypergraph-based tabular language model.

Introduction

This repository contains the official implementation for the paper HyTrel: Hypergraph-enhanced Tabular Data Representation Learning with code, data, and checkpoints. figure1

Installation

It's recommended to use python 3.9.

Here is an example of creating the environment using Anaconda.

  • Create the virtual environment using conda create -n hytrel python=3.9
  • Install the required packages with the corresponding versions from requirements.txt

Note: If you encounter difficulty installing torch_geometric, please refer here to install it according to your environment settings.

Pretraining

  • Pre-process the raw data, slicing the big file into chunks, and put the *.jsonl files into the directory /data/pretrain/chunks/. Sample data is present here and the files can be used as reference.
    Note: Pretraining data *.jsonl are acquired and preprocessed by using the scripts from the TaBERT.

  • Run python parallel_clean.py to clean and serialize the tables.
    Note: We serialize the tables as arrow in consideration of memory usage.

  • Run sh pretrain_electra.sh to pretrain HyTrel with the ELECTRA objective.

  • Run sh pretrain_contrast.sh to pretrain HyTrel with the Contrastive objective.

Evaluation

First put the ELECTRA-pretrained checkpoint to /checkpoints/electra/, and Contrast-pretrained checkpoint to /checkpoints/contrast/.

  • Put the data {train, dev, test}.table_col_type.json and type_vocab.txt into the directory /data/col_ann/.

  • Run sh evaluate_cta_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_cta_contrast.sh with Contrast-pretrained checkpoint.

  • Put the data {train, dev, test}.table_rel_extraction.json and relation_vocab.txt into the directory /data/col_rel/.

  • Run sh evaluate_cpa_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_cpa_contrast.sh with Contrast-pretrained checkpoint.

Table Type Annotation

  • Decompose ttd.tar.gz into train, dev, test data folders under the directory /data/ttd/.

  • Run sh evaluate_ttd_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_ttd_contrast.sh with Contrast-pretrained checkpoint.

Reference

Please cite our paper.

@inproceedings{NEURIPS2023_66178bea,
 author = {Chen, Pei and Sarkar, Soumajyoti and Lausen, Leonard and Srinivasan, Balasubramaniam and Zha, Sheng and Huang, Ruihong and Karypis, George},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {32173--32193},
 publisher = {Curran Associates, Inc.},
 title = {HyTrel: Hypergraph-enhanced  Tabular Data Representation Learning},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/66178beae8f12fcd48699de95acc1152-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

Contact

For the data and model checkpoints, please find them in the checkpoints folder.

If you have more questions, please email: chen.pei518@163.com (Pei Chen)

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published