Installation

Pollen vision library

Simple and unified interface to zero-shot computer vision models curated for robotics use cases.

Check out our HuggingFace space for an online demo or try pollen-vision in a Colab notebook!

Get started in very few lines of code!

Perform zero-shot object detection and segmentation on a live video stream from your webcam with the following code:

import cv2

from pollen_vision.vision_models.object_detection import OwlVitWrapper
from pollen_vision.vision_models.object_segmentation import MobileSamWrapper
from pollen_vision.perception.utils import Annotator, get_bboxes


owl = OwlVitWrapper()
sam = MobileSamWrapper()
annotator = Annotator()

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    predictions = owl.infer(
        frame, ["paper cups"]
    )  # zero-shot object detection | put your classes here
    bboxes = get_bboxes(predictions)

    masks = sam.infer(frame, bboxes=bboxes)  # zero-shot object segmentation
    annotated_frame = annotator.annotate(frame, predictions, masks=masks)

    cv2.imshow("frame", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        cv2.destroyAllWindows()
        break

Supported models

We continue to work on adding new models that could be useful for robotics perception applications.

We chose to focus on zero-shot models to make it easier to use and deploy. Zero-shot models can recognize objects or segment them based on text queries, without needing to be fine-tuned on annotated datasets.

Right now, we support:

Object detection

Yolo-World for zero-shot object detection and localization
Owl-Vit for zero-shot object detection and localization
Recognize-Anything for zero-shot object detection (without localization)

Object segmentation

Mobile-SAM for (fast) zero-shot object segmentation

Monocular depth estimation

Depth Anything for (non metric) monocular depth estimation

Below is an example of combining Owl-Vit and Mobile-Sam to detect and segment objects in a point cloud, all live. (Note: in this example, there is no temporal or spatial filtering of any kind, we display the raw outputs of the models computed independently on each frame)

pc_segmentation_doc3-2024-02-26_17.07.20.mp4

We also provide wrappers for the Luxonis cameras which we use internally. They allow to easily access the main features that are interesting to our robotics applications (RBG-D, onboard h264 encoding and onboard stereo rectification).

Installation

Note: This package has been tested on Ubuntu 22.04 and macOS (with M1 Pro processor).

Git LFS

This repository uses Git LFS to store large files. You need to install it before cloning the repository.

Ubuntu

sudo apt-get install git-lfs

macOS

brew install git-lfs

One line installation

You can install the package directly from the repository without having to clone it first with:

pip install "pollen-vision[vision] @ git+https://github.com/pollen-robotics/pollen-vision.git@main"

Note: here we install the package with the vision extra, which includes the vision models. You can also install the depthai_wrapper extra to use the Luxonis depthai wrappers.

Install from source

Clone this repository and then install the package either in "production" mode or "dev" mode.

👉 We recommend using a virtual environment to avoid conflicts with other packages.

After cloning the repository, you can either install everything with:

pip install .[all]

or install only the modules you want:

pip install .[depthai_wrapper]
pip install .[vision]

To add "dev" mode dependencies (CI/CD, testing, etc):

pip install -e .[dev]

Luxonis depthai specific information

If this is the first time you use luxonis cameras on this computer, you need to setup the udev rules:

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Gradio demo

Test the demo online

A gradio demo is available on Pollen Robotics' Huggingface space. It allows to test the models on your own images without having to install anything.

Run the demo locally

If you want to run the demo locally, you can install the dependencies with the following command:

pip install pollen_vision[gradio]

You can then run the demo locally on your machine with:

python pollen-vision/gradio/app.py

Examples

Vision models wrappers

Check our example notebooks!

Luxonis depthai wrappers

Check our example scripts!

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
gradio		gradio
pollen_vision		pollen_vision
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

License

pollen-robotics/pollen-vision

Folders and files

Latest commit

History

Repository files navigation

Get started in very few lines of code!

Object detection

Object segmentation

Monocular depth estimation

Installation

Git LFS

Ubuntu

macOS

One line installation

Install from source

Luxonis depthai specific information

Gradio demo

Test the demo online

Run the demo locally

Examples

Vision models wrappers

Luxonis depthai wrappers

About

Topics

Resources

License

Stars

Watchers

Forks

Languages