Skip to content

UbiquitousLearning/Paper-list-resource-efficient-large-language-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 

Repository files navigation

⚠️ This repository is not maintained anymore. Checkout our survey paper on efficient LLM and the corresponding paper list.

Paper-list-resource-efficient-large-language-model

Target venues: system conferences (OSDI/SOSP/ATC/EuroSys), architecture conferences (ISCA/MICRO/ASPLOS/HPCA), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp), AI conferences (NeurIPS/ACL/ICLR/ICML)

We will keep maintaining this list :)

Note: We only focus on inference now. We plan to involve training work in the future.

Example: [Conference'year] Title, First-author Affiliation

Model

[ICLR'23] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS, IST Austria
[ICLR'23] Token Merging: Your ViT But Faster, Georgia Tech
[ICLR'23] Efficient Attention via Control Variates, University of Hong Kong
[ICLR'23] HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer, University of Chinese Academy of Sciences
[ICLR'23] Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models, Tencent AI Lab

[MLSys'23] Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization, National university of Singapore

[ACL'22] AraT5: Text-to-Text Transformers for Arabic Language Generation The University of British Columbia
[ACL'22] ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer Tianjin University
[ACL'22] ∞-former: Infinite Memory Transformer Instituto de Telecomunicações
[ACL'22] LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding South China University of Technology
[ACL'22] PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation Baidu Inc

[ICLR'22] Memorizing Transformers, Google
[ICLR'22] Understanding the Role of Self Attention for Efficient Speech Recognition, Seoul National University

[NeurIPS'22] Confident Adaptive Language Modeling, Google Research
[NeurIPS'22] Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling Microsoft Research Asia
[NeurIPS'22] Large Language Models are Zero-Shot Reasoners The University of Tokyo
[NeurIPS'22] Training language models to follow instructions with human feedback, OpenAI

[ACL'21] RealFormer: Transformer Likes Residual Attention Google Research

[NeurIPS'21] Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices Virginia Commonwealth University
[NeurIPS'21] Systematic Generalization with Edge Transformers University of California
[NeurIPS'21] NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM Colorado School of Mines
[NeurIPS'21] Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems Jadavpur University
[NeurIPS'21] Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification Amazon
[NeurIPS'21] Sparse is Enough in Scaling Transformers Google Research
[NeurIPS'21] Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation Macquarie University
[NeurIPS'21] Long-Short Transformer: Efficient Transformers for Language and Vision University of Maryland
[NeurIPS'21] Combiner: Full Attention Transformer with Sparse Computation Cost Stanford University
[NeurIPS'21] FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention University of California
[NeurIPS'21] Searching for Efficient Transformers for Language Modeling Google Research

[SenSys'21] LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications, Nanyang Technological University

[NeurIPS'20] Deep Transformers with Latent Depth Facebook AI Research
[NeurIPS'20] Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing Carnegie Mellon University
[NeurIPS'20] MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Microsoft Research
[NeurIPS'20] Big Bird: Transformers for Longer Sequences Google Research
[NeurIPS'20] Fast Transformers with Clustered Attention Idiap Research Institute, Switzerland

[NeurIPS'19] Levenshtein Transformer Facebook AI Research
[NeurIPS'19] Novel positional encodings to enable tree-based transformers Microsoft Research
[NeurIPS'19] A Tensorized Transformer for Language Modeling Tianjin University

[ICLR'18] Non-Autoregressive Neural Machine Translation, University of Hong Kong

Input

[UbiComp'22] IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer, National University of Defense Technology

Training algorithm

[MobiCom'23] Efficient Federated Learning for Modern NLP, Beijing University of Posts and Telecommunications
[MobiCom'23] Federated Few-shot Learning for Mobile NLP, Beijing University of Posts and Telecommunications
[ICLR'23] Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization, Tsinghua University
[ICLR'23] Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers, Hong Kong University of Science and Technology
[ATC'23] Accelerating Distributed MoE Training and Inference with Lina, City University of Hong Kong
[ATC'23] SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization, Tsinghua University

[ICLR'22] Towards a Unified View of Parameter-Efficient Transfer Learning, Carnegie Mellon University

[NeurIPS'22] AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning, Sun Yat-sen University
[NeurIPS'22] A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models, Chinese Academy of Sciences
[NeurIPS'22] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively, Peking University

[NeurIPS'20] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping Microsoft Corporation

[NeurIPS'19] Ouroboros: On Accelerating Training of Transformer-Based Language Models Duke University

Inference engine

[ASPLOS'23] FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks, Georgia Institute of Technology
[ISCA'23] OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization, SJTU
[ISCA'23] FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction, THU

[EuroSys'23]Tabi: An Efficient Multi-Level Inference System for Large Language Models, HKUST
[MLSys'23] Flex: Adaptive Mixture-of-Experts at Scale, Microsoft Research

[MLSys'23] Efficiently Scaling Transformer Inference, Google

[OSDI'22] Orca: A Distributed Serving System for Transformer-Based Generative Models, Seoul National University

[ATC'22] PetS: A Unified Framework for Parameter-Efficient Transformers Serving, Peking University

[NeurIPS'22] Towards Efficient Post-training Quantization of Pre-trained Language Models, Huawei Noah’s Ark Lab
[NeurIPS'22] Solving Quantitative Reasoning Problems with Language Models, Google Research
[NeurIPS'22] Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees, ETH Zürich, Switzerland
[NeurIPS'22] Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models, NC State University
[NeurIPS'22] Exploring Length Generalization in Large Language Models, Google Research

[ACL'21] MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Microsoft Research

[ASPLOS'23] Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models Google

[MobiCom'23] LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup, Microsoft Research Asia

[ACL'23] Distilling Script Knowledge from Large Language Models for Constrained Language Planning, Fudan University
[ACL'23] I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation, University of Southern California
[ACL'23] Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step, University of California
[ACL'23] GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model, Anhui University

[NeurIPS'23] Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind, University of North Carolina at Chapel Hill
[NeurIPS'23] Blockwise Parallel Transformer for Large Context Models, UC Berkeley
[NeurIPS'23] LLM-Pruner: On the Structural Pruning of Large Language Models, National University of Singapore
[NeurIPS'23] The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter, University of Texas at Austin
[NeurIPS'23] Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time, Rice University
[NeurIPS'23] Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers, ETH Zürich
[NeurIPS'23] QuIP: 2-Bit Quantization of Large Language Models With Guarantees, Cornell University

Training Engine

[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Yonsei University

[ASPLOS'23] Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers THU

[HPCA'23] MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism UCST

[HPCA'23] OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing KAIST

[NeurIPS'23] QLoRA: Efficient Finetuning of Quantized LLMs, University of Washington

Compiler

Hardware

Search Engine

[UbiComp'23] ODSearch: Fast and Resource Efficient On-device Natural Language Search for Fitness Trackers' Data Boston University

Releases

No releases published

Packages

No packages published