Skip to content
@Q-Future

Visual Evaluation with Foundation Models

We are working towards a future that one foundation model can be a multi-purpose expert for low-level visual perception and visual evaluation.

👁️‍🗨️ Low-level Visual Perception in the Foundation Model Era

🔖Aiming at next-era cornerstone research

Low-level Visual Perception | Multi-Modality Large Language Models | Visual Quality Assessment

📖Main Projects

  • Co-Instruct: Homepage, Repo, Demo. Open-ended visual quality comparer (up to 4 images), low-level visual assistant, an improved version of ②Q-Instruct [CVPR 2024].

  • Q-Align: Homepage, Repo, Demo. A unified visual scorer for images and videos, via text-instructed alignment on multi-modality foundation models; can efficiently fine-tune to more datasets with stable good performance. State-of-the-art on IQA, VQA, and IAA.

  • Q-Instruct [CVPR 2024]: Homepage, Repo, 200K Dataset, Technical Report A large-scale instruction tuning dataset to improve low-level perceptual abilities of foundation models.

  • Q-Bench+ [ICLR2024, Spotlight]: Homepage, Repo, Data-Single, Data-Pair, Preprint The first low-level benchmark for foundation models on low-level vision.

🖋️Extension Projects

  • Q-Boost: Homepage A discussion on boosting the IQA performance for non-specially-IQA-aligned MLLMs.

  • [Pending]Chinese-Q-Bench/质衡: Homepage, Repo The first attempt to test multi-lingual abilities on low-level vision.

Maintained by Teo Wu@Singapore and Zicheng Zhang@Shanghai.

Pinned

  1. Co-Instruct Co-Instruct Public

    ④[Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.

    43 3

  2. Q-Align Q-Align Public

    ③[IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 137 11

  3. Q-Instruct Q-Instruct Public

    ②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.

    Python 158 8

  4. Q-Bench Q-Bench Public

    ①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

    Jupyter Notebook 188 11

Repositories

Showing 6 of 6 repositories
  • Q-Align Public

    ③[IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

    Python 137 MIT 11 0 0 Updated Apr 19, 2024
  • Co-Instruct Public

    ④[Comparison among Multiple Images!] A study on open-ended multi-image quality comparison: a dataset, a model and a benchmark.

    43 MIT 3 0 0 Updated Mar 17, 2024
  • Q-Instruct Public

    ②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.

    Python 158 MIT 8 5 0 Updated Mar 5, 2024
  • Q-Bench Public

    ①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

    Jupyter Notebook 188 11 1 0 Updated Mar 1, 2024
  • .github Public

    We are an open-source collaborative project to bring new possibilities to IQA!

    2 0 0 0 Updated Feb 28, 2024
  • Chinese-Q-Bench Public

    [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the future.

    16 1 2 1 Updated Jan 7, 2024

Top languages

Loading…

Most used topics

Loading…