Skip to content

wwweiwei/awesome-self-supervised-learning-for-tabular-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Awesome Self-Supervised Learning for Non-Sequential Tabular Data (SSL4NSTD)

Version LastUpdated Topic

This repository contains the frontier research on self-supervised learning for tabular data which has been a popular topic recently.
This list is maintained by Wei-Wei Du and Wei-Yao Wang. (Actively keep updating)
If you have come across relevant resources or found some errors in this repository, feel free to open an issue or submit a PR.

Survey Paper

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

Citation

@article{DBLP:journals/corr/abs-2402-01204,
  author       = {Wei{-}Yao Wang and
                  Wei{-}Wei Du and
                  Derek Xu and
                  Wei Wang and
                  Wen{-}Chih Peng},
  title        = {A Survey on Self-Supervised Learning for Non-Sequential Tabular Data},
  journal      = {CoRR},
  volume       = {abs/2402.01204},
  year         = {2024}
}

Papers

Predictive Learning

  • VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (NeurIPS'20) [Paper] [Supplementary] [Code]
  • TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL'20) [Paper]
  • CORE: Self- and Semi-supervised Tabular Learning with COnditional REgularizations (NeurIPS'21) [Paper]
  • TabTransformer: Tabular Data Modeling Using Contextual Embeddings [Paper]
  • TabNet: Attentive Interpretable Tabular Learning (AAAI'21) [Paper] Code
  • Self-Supervision Enhanced Feature Selection with Correlated Gates (ICLR'22) [Paper] [Code]
  • TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper] [Code] [Blog]
  • LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks (NeurIPS'22) [Paper] [Code]
  • Self Supervised Pre-training for Large Scale Tabular Data (NeurIPS'22 TRL Workshop) [Paper] [Blog]
  • Local Contrastive Feature Learning for Tabular Data (CIKM'22) [Paper]
  • Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data (preprint-23) [Paper]
  • Generative Table Pre-training Empowers Models for Tabular Prediction (EMNLP'23) [Paper] [Code]
  • TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (ICLR'23) [Paper] [Code]
  • STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables (ICLR'23) [Paper] [Code]
  • Language Models are Realistic Tabular Data Generators (ICLR'23) [Paper] [Code]
  • Self-supervised Representation Learning from Random Data Projectors (NeurIPS'23 TRL Workshop) [Paper] [Code]
  • SwitchTab: Switched Autoencoders Are Effective Tabular Learners (AAAI'24) [Paper]
  • Making Pre-trained Language Models Great on Tabular Prediction (ICLR'24) [Paper]

Contrastive Learning

  • SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption (ICLR'22) [Paper] [Code]
  • STab: Self-supervised Learning for Tabular Data (NeurIPS'22 Workshop on TRL) [Paper]
  • TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper]
  • PTaRL: Prototype-based Tabular Representation Learning via Space Calibration (ICLR'24) [Paper]

Hybrid Learning

  • SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning (NeurIPS'21) [Paper] [Supplementary] [Code]
  • SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training (NurIPS‘22 Workshop on TRL) [Paper] [Code]
  • Transfer Learning with Deep Tabular Models (ICLR'23) [Paper] [Code]
  • DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM'23) [Paper] [Code]
  • ReConTab: Regularized Contrastive Representation Learning for Tabular Data (NeurIPS'23 Workshop on TRL) [Paper]
  • XTab: Cross-table Pretraining for Tabular Transformers (ICML'23) [Paper]
  • UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science (ICLR'24) [Paper]

Benchmarks

Benchmark Task #Datasets Paper
MLPCBench Classification 40 Kadra et al., 2021
DLBench Classification, Regression 11 Shwartz-Ziv and Armon, 2022
TabularBench Classification, Regression 45 Grinsztajn et al., 2022
TabZilla Classification 36 McElfresh et al., 2023
TabPretNet Unlabeled, Classification, Regression 2000 Ye et al., 2023

Tutorials

  • Self-Supervised Learning: Self-Prediction and Contrastive Learning (NeurIPS'21) [Website]

Workshops

  • Table Representation Learning (NeurIPS) [Website]

Related Survey

  • Deep Neural Networks and Tabular Data: A Survey [Paper]
  • Self-Supervised Learning for Recommender Systems: A Survey (TKDE) [Paper]
  • Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data [Paper]
  • Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects [Paper]
  • On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence [Paper]
  • A Survey on Time-Series Pre-Trained Models [Paper]

Tools & Libraries

  • Pytorch Frame: A modular deep learning framework for building neural network models on heterogeneous tabular data [Link]
  • PyTorch Tabular: A Framework for Deep Learning with Tabular Data [Link]

Releases

No releases published

Packages

No packages published