You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run OWLv2 (google/owlv2-base-patch16-ensemble) to perform object detection.
I am following the example code to perform inference. I am using a Colab notebook with a T4 GPU and using transformers version 4.40.2. When I try to perform inference, the cell just keeps running and eventually crashes with the message: Your session crashed after using all available RAM. This is surprising because the model is not that large (relatively speaking) and inference for a single image using OWL-ViT (google/owlvit-base-patch32) takes < 0.001 seconds. Not sure where this difference is coming from? Here is the code I am running:
importrequestsfromPILimportImageimporttorchfromtransformersimportOwlv2Processor, Owlv2ForObjectDetectionurl="http://images.cocodataset.org/val2017/000000039769.jpg"image=Image.open(requests.get(url, stream=True).raw)
texts= [["a photo of a cat", "a photo of a dog"]]
processor=Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
model=Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble")
withtorch.no_grad(): # tried with and without this lineinputs=processor(text=texts, images=image, return_tensors="pt")
outputs=model(**inputs)
target_sizes=torch.Tensor([image.size[::-1]])
# Convert outputs (bounding boxes and class logits) to Pascal VOC Format (xmin, ymin, xmax, ymax)results=processor.post_process_object_detection(outputs=outputs, target_sizes=target_sizes, threshold=0.1)
i=0# Retrieve predictions for the first image for the corresponding text queriestext=texts[i]
boxes, scores, labels=results[i]["boxes"], results[i]["scores"], results[i]["labels"]
Has anyone run into a similar issue and resolved it? I imagine there's some issue with actually utilizing the GPU, but again this issue is not happening for OWL-ViT with nearly identical code. Thanks!
The text was updated successfully, but these errors were encountered:
I found that your code for reproducing is not using GPU. I have updated it as follows
importrequestsfromPILimportImageimporttorchfromtransformersimportOwlv2Processor, Owlv2ForObjectDetectiondevice="cuda"url="http://images.cocodataset.org/val2017/000000039769.jpg"image=Image.open(requests.get(url, stream=True).raw)
texts= [["a photo of a cat", "a photo of a dog"]]
processor=Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
model=Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble", device_map=device)
# ^^^^^^^^^^^^^inputs=processor(text=texts, images=image, return_tensors="pt").to(device)
# ^^^^^^^^^^^withtorch.no_grad(): # tried with and without this lineoutputs=model(**inputs)
target_sizes=torch.Tensor([image.size[::-1]])
# Convert outputs (bounding boxes and class logits) to Pascal VOC Format (xmin, ymin, xmax, ymax)results=processor.post_process_object_detection(outputs=outputs, target_sizes=target_sizes, threshold=0.1)
i=0# Retrieve predictions for the first image for the corresponding text queriestext=texts[i]
boxes, scores, labels=results[i]["boxes"], results[i]["scores"], results[i]["labels"]
print(boxes, scores, labels)
It works fine locally and in colab. I use the following setup
!pip install -U transformers==4.40.2 accelerate
Here are the results, inference takes only ~3-4GB of GPU RAM:
Are you running exactly this script or is there anything else that can cause the problem?
I am trying to run OWLv2 (
google/owlv2-base-patch16-ensemble
) to perform object detection.I am following the example code to perform inference. I am using a Colab notebook with a T4 GPU and using transformers version 4.40.2. When I try to perform inference, the cell just keeps running and eventually crashes with the message:
Your session crashed after using all available RAM
. This is surprising because the model is not that large (relatively speaking) and inference for a single image using OWL-ViT (google/owlvit-base-patch32
) takes < 0.001 seconds. Not sure where this difference is coming from? Here is the code I am running:Has anyone run into a similar issue and resolved it? I imagine there's some issue with actually utilizing the GPU, but again this issue is not happening for OWL-ViT with nearly identical code. Thanks!
The text was updated successfully, but these errors were encountered: