Skip to content

FMXExpress/AI-Vision-Chat

Repository files navigation

AI Vision Chat

Chat with large languages models about the contents of an image via this native desktop client for Windows, macOS, and Linux.

Features llava-13b which uses a vision encoder and Vicuna to allow you to discuss the contents of an image with the large language model (LLM). Additionally, it includes an API X-Ray tab so you can easily how the interaction with the API server works.

The AI Vision Chat Desktop client is a powerful UI for allowing you to discuss the contents of an image with a chat bot.

Visual Language Models supported:

  • LLaVA-13b
  • fuyu-8b
  • minigpt-4
  • instructblip-vicuna13b
  • mplug-owl
  • blip-2

Built with Delphi using the FireMonkey cross-platform development framework this client works on Windows, macOS, and Linux (and maybe Android+iOS) with a single codebase and single UI. At the moment it is specifically set up for Windows.

It also features a REST integration with Replicate.com for hosting the model used in the client. You need to sign up for an API key to access that functionality. Replicate models can be run in the cloud or locally via docker.

docker run -d -p 5000:5000 --gpus=all r8.im/yorickvp/llava-13b@sha256:6bc1c7bb0d2a34e413301fee8f7cc728d2d4e75bfab186aa995f63292bda92fc
curl http://localhost:5000/predictions -X POST \
-H "Content-Type: application/json" \
-d '{"input": {
  "image": "https://url/to/file",
    "prompt": "...",
    "top_p": "...",
    "temperature": "...",
    "max_tokens": "..."
  }}'

AI Vision Chat Desktop client Screeshot on Windows

AI Vision Chat Desktop client on Windows

Other Delphi AI clients:

SDXL Inpainting

Stable Diffusion Desktop Client

CodeDroidAI

ControlNet Sketch To Image

AutoBlogAI

AI Code Translator

AI Playground

Song Writer AI

Stable Diffusion Text To Image Prompts

Generative AI Prompts

Dreambooth Desktop Client

Text To Vector Desktop Client