Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed repository structure #6

Open
SkalskiP opened this issue Nov 29, 2023 · 3 comments
Open

Proposed repository structure #6

SkalskiP opened this issue Nov 29, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@SkalskiP
Copy link
Collaborator

SkalskiP commented Nov 29, 2023

Proposed Code Structure

Every prompting pipeline comes with prompt_creator and result_processor. You can manually instantiate instances of those classes or call pipeline function providing name argument.

from abc import ABC, abstractmethod
from typing import Tuple, List, Dict
import numpy as np
import supervision as sv


class BasePromptCreator(ABC):
    @abstractmethod
    def create(self, image: np.ndarray, *args, **kwargs) -> Tuple[np.ndarray, sv.Detections]:
        """
        Create a prompt from an image and additional arguments.

        Args:
            image (np.ndarray): The input image.
            *args, **kwargs: Additional arguments.

        Returns:
            Tuple[np.ndarray, sv.Detections]: A tuple containing a processed image and detections.
        """
        pass


class BaseResultProcessor(ABC):
    @abstractmethod
    def process(self, text: str, marks: sv.Detections, *args, **kwargs) -> Dict[str, str]:
        """
        Process the results with given text and detections.

        Args:
            text (str): The input text.
            marks (sv.Detections): Detections to be used in processing.
            *args, **kwargs: Additional arguments.

        Returns:
            Dict[str, str]: Processed results.
        """
        pass


    @abstractmethod
    def visualize(self, text: str, image: np.ndarray, marks: sv.Detections, *args, **kwargs) -> np.ndarray:
        """
        Visualize the results on an image.

        Args:
            text (str): The input text.
            image (np.ndarray): The input image.
            marks (sv.Detections): Detections to be visualized.
            *args, **kwargs: Additional arguments.

        Returns:
            np.ndarray: The image with visualizations.
        """
        pass


class SamPromptCreator(BasePromptCreator):
    def __init__(self, device: str):
        self.device = device

    def create(image: np.ndarray, mask: Optional[np.ndarray] = none) -> Tuple[image: np.ndarray, sv.Detections]:
        pass


class SamResultProcessor(BaseResultProcessor):
    
    def process(text: str, marks: sv.Detections) -> List[str]:
        pass

    def visualize(text: str, image: np.ndarray, marks: sv.Detections) -> np.ndarray:
        pass


class GroundingDinoPromptCreator(BasePromptCreator):
    def __init__(self, device: str):
        self.device = device

    def create(image: np.ndarray, categories: List[str]) -> Tuple[image: np.ndarray, sv.Detections]:
        pass


class GroundingDinoResultProcessor(BaseResultProcessor):
    
    def process(text: str, marks: sv.Detections) -> Dict[str, str]:
        pass

    def visualize(text: str, image: np.ndarray, marks: sv.Detections) -> np.ndarray:
        pass


PIPELINES = {
    'sam': (SamPromptCreator, SamResultProcessor),
    'grounding-dino': (GroundingDinoPromptCreator, GroundingDinoResultProcessor)
}


def pipeline(name: str, **kwargs) -> Tuple[BasePromptCreator, BaseResultProcessor]:
    """Retrieves the prompt creator and result processor for the specified pipeline.

    Args:
        name (str): The name of the pipeline.
        **kwargs: Additional keyword arguments for initializing the classes.

    Returns:
        Tuple[BasePromptCreator, BaseResultProcessor]: Instances of the prompt creator and result processor.

    Raises:
        ValueError: If the pipeline name is not in the PIPELINES dictionary.
    """
    pipeline_classes = PIPELINES.get(name)

    if pipeline_classes is None:
        raise ValueError(f"Pipeline '{name}' not found. Please choose from {list(PIPELINES.keys())}.")

    PromptCreatorClass, ResultProcessorClass = pipeline_classes

    prompt_creator = PromptCreatorClass(**kwargs)
    result_processor = ResultProcessorClass(**kwargs)

    return prompt_creator, result_processor

Example Usage

LMM inference gets sandwiched between prompt_creator and result_processor calls.

import cv2
from maestro import pipeline, prompt_gpt4_vision

prompt_creator, result_processor = pipeline('sam', device='cuda')

image_prompt, marks = prompt_creator(image=image)
text_prompt = 'Find dog.'
api_key = '...'

response = prompt_gpt4_vision(
    text_prompt=text_prompt, 
    image_prompt=image_prompt, 
    api_key=api_key)

visualization = result_processor.visualize(
    text=response, 
    image=image, 
    marks=marks)
@SkalskiP SkalskiP added the enhancement New feature or request label Nov 29, 2023
@PawelPeczek-Roboflow
Copy link

PawelPeczek-Roboflow commented Nov 30, 2023

Looks good as a baseline, I am just wondering change in this theme would be more verbose:

maestro = build_maestro('sam', device='cuda').with("gpt-4")
result = maestro.prompt("Find a dog").with_image(image).visualize()

Naming conventions to be agreed - I just would like to point out that usage of prompt_creator and result_processor with custom things (that cannot be fully custom) in between - may bring confusion for less advanced users - especially that result_processor probably assumes some structure of response that may not be guaranteed given that client uses their own logic instead of prompt_gpt4_vision()

for more advanced use cases, however - I would let .with("gpt-4") to be replaced with .with(my_callable) where my_callable takes agreed parameters and clients can inject implementation.

@yeldarby
Copy link

This makes sense to me for set of marks style prompts where you're annotating an image.

I think we may want to have some aspirational things that we may implement some day that we're keeping in mind as we design the API structure. Some thoughts on potential future directions of exploration:

  • Chaining - taking the output of one response, doing another transformation, and passing it back (eg "find the dog" -> it finds it -> we crop the photo to isolate the object of interest -> "describe this dog")
  • Few-shot - pulling similar images (and captions/annotations) from a vector DB & passing them along with your prompt to show by example what you want (or "spot the difference" style prompting against a reference image)
  • RAG - pulling relevant images from a vector DB to add additional context
  • Temporal / Video - to help with eg the sports broadcasting example
  • Tool use - using another model like a fine-tuned CNN to be able to add additional context
  • Integration with existing tools like LangChain (so you can eg us these prompting techniques as part of agent flows)

@SkalskiP
Copy link
Collaborator Author

SkalskiP commented Dec 1, 2023

Cool! I'll keep that in mind. We had a call with @PawelPeczek-Roboflow. We agreed on PromptCreator and ResultProcessor structure. Those can encapsulate a lot of the logic you just described. We just need to make sure the top layer allows to freely pass versions arguments. But because we are still not sure what we want to support we'll add high level API at the very end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants