GPT-4 Vision for HoloLens

Overview

This project demonstrates the integration of OpenAI's GPT-4 Vision API with a HoloLens application. Users can capture images using the HoloLens camera and receive descriptive responses from the GPT-4V model.

Demo

gpt4-vision-hololens-demo.mp4

Screenshot of demo result

'A laptop displaying a webpage with the header "Let's build from here" is placed next to a spiral notebook and a pen on a dark surface.'

Dependencies

Newtonsoft.JSON
MRTK Foundation
MRTK Standard Assets

Setup

Open the GPT4 Vision Example-Scene
Specify your OpenAI key in the GameObject GPT4Vision > OpenAIWrapper (or hardcode it into the OpenAIWrapper.cs class)
Specify your base prompt (which is concatenated to the image sent to OpenAI), e.g. Describe this image.
Specify max tokens, sampling temperature, and image detail for the OpenAI API call

Running the application

Build the app as .appx (or deploy to HoloLens directly, e.g. via Visual Studio) and install it on your HoloLens
Run the app. Press on the camera button to capture a photo using HoloLens' PV camera which gets send to OpenAI's API.
See the inference result (based on your prompt) displayed on the label.

Using the .unitypackage

Make sure you have the dependencies from above installed.
Import the package via Assets > Import Package.
Either open up the GPT4 Vision Example-Scene, or import the GPT4Vision-Prefab into your own scene.
Edit the base prompt, tokens, temperature, image detail as described above.
Optional: call CapturePhoto() within the GPT4Vision-Prefab (in case you do not want to use the button and label within the Prefab).

Performance improvements

For some reason, the built-in UnityEngine.Windows.WebCam approach provided by Microsoft is really slow (~1.2s per captured photo on average, regardless of resolution). Also, inference speed on OpenAI's server can vary quite a bit. If you need this approach in real-time, skip PhotoCapture altogether (Research Mode) and think about hosting your own LMM. Feel free to message me if you need some pointers.

Disclaimer

This project is a barebones prototype for now and still WIP. Feel free to create a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Assets		Assets
Packages		Packages
ProjectSettings		ProjectSettings
.gitignore		.gitignore
Assembly-CSharp.csproj		Assembly-CSharp.csproj
GPT-4-Vision-for-HoloLens.sln		GPT-4-Vision-for-HoloLens.sln
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assets

Assets

Packages

Packages

ProjectSettings

ProjectSettings

.gitignore

.gitignore

Assembly-CSharp.csproj

Assembly-CSharp.csproj