Gemini Vision Pro API with Multimodal Prompts Integration with JavaScript (Node.js & Express.js)

This project implements the Gemini Pro Vision LLM (Google Generative AI) library to process text and images together, producing relevant text responses. The Gemini Pro Vision model excels at multimodal tasks, including visual understanding, classification, summarization, and content creation from images and videos.

About Gemini Vision Pro

Gemini Pro Vision is a versatile large language vision model that interprets input from text and visual modalities (images and videos) to generate contextually relevant text responses. It serves as a foundational model capable of performing well across various multimodal tasks, such as visual understanding, object identification, content extraction from images, and much more. Its applications extend to processing visual and text inputs from photographs, documents, infographics, screenshots, and more.

Use Cases

Visual Information Seeking: Utilize external knowledge combined with information extracted from the input image or video to answer questions.
Object Recognition: Answer questions related to fine-grained identification of objects in images and videos.
Digital Content Understanding: Answer questions and extract information from visual content like infographics, charts, figures, tables, and web pages.
Structured Content Generation: Generate responses based on multimodal inputs in formats like HTML and JSON.
Captioning and Description: Generate descriptions of images and videos with varying levels of details.
Reasoning: Compositionally infer new information without memorization or retrieval.

Installation

Clone the repository
Install the dependencies
```
npm install
```

Usage

Add your Google API key to the .env file
```
GOOGLE_API_KEY=your_google_api_key
```
Run the script with Node.js
```
node index.js
```
Or use API in Postman
```
npm start
```

Functionality

The script uses the Google Generative AI library to generate content based on a template and an image. The model.generateContent function is used to generate the content. It takes an array as an argument, which includes the template and the image data. The generated content is then logged to the console or sent in API response.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
app.js		app.js
index.js		index.js
messi.png		messi.png
package-lock.json		package-lock.json
package.json		package.json
postman.png		postman.png
readme.md		readme.md
terminal.png		terminal.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

app.js

app.js

index.js

index.js

messi.png

messi.png

package-lock.json

package-lock.json

package.json

package.json

postman.png

postman.png

readme.md

readme.md

terminal.png

terminal.png

Repository files navigation

Gemini Vision Pro API with Multimodal Prompts Integration with JavaScript (Node.js & Express.js)

About Gemini Vision Pro

Use Cases

Installation

Usage

Functionality

Snapshots

About

Releases

Packages

Languages

arslanstack/gemini-vision-pro-implementation

Folders and files

Latest commit

History

Repository files navigation

Gemini Vision Pro API with Multimodal Prompts Integration with JavaScript (Node.js & Express.js)

About Gemini Vision Pro

Use Cases

Installation

Usage

Functionality

Snapshots

About

Topics

Resources

Stars

Watchers

Forks

Languages