Skip to content

Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.

License

Notifications You must be signed in to change notification settings

Theia-4869/CDPruner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💽 Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

A training-free and model-agnostic visual token pruning method for MLLM inference acceleration by maximizing the conditional diversity of retained tokens.

[📄 Paper] [🌐 Project Page]

📰 News

🔥 [2025/06/29] Our Project Page is updated!

👁️ Overview

Abundant efforts have been made to reduce the inference cost of MLLMs by pruning visual tokens, and existing methods can be roughly divided into two categories. The first is to identify visual tokens with high attention scores as important and discard those deemed less critical, which only considers the importance of visual tokens, resulting in a large number of duplicate tokens being retained. The second is to remove redundant parts based on feature similarity between visual tokens, which neglects user instructions, failing to achieve dynamic pruning in alignment with the current question. CDPruner considers the conditional diversity of the selected subset, dynamically adjusting pruning according to the user instructions and retaining maximal visual information.

comparsion

CDPruner first calculates the similarity between visual tokens conditioned on their relevance to the current instruction. Then, CDPruner uses a DPP to select the subset to keep. As a training-free and model-agnostic method, it ensures both the diversity and quality of the selected token subset, significantly reducing computational cost while maintaining considerable performance.

design

⚙️ Setup

🏝️ Environment

  1. Clone this repository.
git clone https://github.com/Theia-4869/CDPruner.git
cd CDPruner
  1. Install necessary packages.
conda create -n cdpruner python=3.10 -y
conda activate cdpruner
pip install -e .
  1. (Optional) Install FlashAttention for further inference acceleration.
pip install flash-attn --no-build-isolation

📦️ Model

Download corresponding LLaVA checkpoints from Hugging Face 🤗:

Version LLM Checkpoint
LLaVA-1.5 Vicuna-7B liuhaotian/llava-v1.5-7b
LLaVA-1.5 Vicuna-13B liuhaotian/llava-v1.5-13b
LLaVA-1.6 (LLaVA-NeXT) Vicuna-7B liuhaotian/llava-v1.6-vicuna-7b
LLaVA-1.6 (LLaVA-NeXT) Vicuna-13B liuhaotian/llava-v1.6-vicuna-13b

📊 Data

Download each dataset according to EVAL.md.

🖼 Visualization

We provide the visualization scripts for relevance scores in scripts/visualize_relevance.py. You can run the following command to visualize the relevance scores:

python scripts/visualize_relevance.py

relevance

The image samples for visualization are from COCO.

📋️ Evaluation

The main implementation of CDPruner is highlighted with [CDPruner] annotations, mainly in llava_llama.py, llava_arch.py and clip_encoder.py.

We provide the evaluation scripts for each benchmark:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/${DATASET}.sh ${VISUAL_TOKEN_NUMBER}

You only need to set the remaining visual token number as the bash argument. For example, if you want to evaluate CDPruner with 128 visual tokens retained on the GQA benchmark, you can run the following command with argument 128:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/gqa.sh 128

And if you want to evaluate CDPruner with 64 visual tokens retained on the MME benchmark, you can run the following command:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh 64

For evaluation with the 13B LLM, you just need to replace the CKPT argument from llava-v1.5-7b to llava-v1.5-13b in each script. And for evaluation with LLaVA-NeXT, you can use the scripts in ./scripts/v1_6/eval. For example, if you want to evaluate CDPruner with 32 * 5 = 160 visual tokens retained on the TextVQA benchmark, you can run the following command:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_6/eval/textvqa.sh 32

The detailed guidance for evaluation commands and online submission of each benchmark can be found in EVAL.md.

🔖 Citation

If you find CDPruner useful for your research and applications, please cite using this BibTeX:

@article{zhang2025cdpruner,
  title={Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs},
  author={Zhang, Qizhe and Liu, Mengzhen and Li, Lichen and Lu, Ming and Zhang, Yuan and Pan, Junwen and She, Qi and Zhang, Shanghang},
  journal={arXiv preprint arXiv:2506.10967},
  year={2025}
}

🎟️ License

This project is released under the Apache 2.0 license.

🏅 Acknowledgement

We appreciate the open-source efforts of LLaVA, Fast-MAP-DPP and TRIM.

About

Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published