-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataParallel in test_set_evaluation.py #11
Comments
Hi @giantke, let me address your concerns:
These solutions don't address the main (unknown) cause of the error, but this should probably work. I hope this provides some clarity and offers a direction to move forward. Please keep me posted on your progress and any further challenges. |
Hi @ttanida,
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Tim,
Congratulations for the awesome work.
I followed the workflow to run the test_set_evaluation.py for inference which costs me enormous sums of time (more than ten days). Thus, I tried to run it with multiple GPUs using
model = torch.nn.DataParallel(model, device_ids=[0, 1, 2, 3])
in the functionget_model()
, and it did work when I checked the occupation of GPUs (all of them were occupied).However, I met a problem when it comes to the 56th iteration with batch_size=4, and also the 14th iteration with batch_size=16. It can not be coincidence since 564=1416, and it pointed to the 224th sample. The error was reported as follows:
56it [00:56, 1.02s/it] Traceback (most recent call last): File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 906, in <module> main() File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 902, in main evaluate_model_on_test_set(model, test_loader, test_2_loader, tokenizer) File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 740, in evaluate_model_on_test_set obj_detector_scores, region_selection_scores, region_abnormal_scores = evaluate_obj_detector_and_binary_classifiers_on_test_set(model, test_loader, test_2_loader) File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 714, in evaluate_obj_detector_and_binary_classifiers_on_test_set num_images = iterate_over_test_loader(test_2_loader, num_images, is_test_2_loader=True) File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 631, in iterate_over_test_loader update_object_detector_metrics_test_loader_2(obj_detector_scores, detections, image_targets, class_detected) File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 555, in update_object_detector_metrics_test_loader_2 intersection_area_per_region_batch, union_area_per_region_batch = compute_intersection_and_union_area_per_region(detections, image_targets, class_detected) File "/public/home/zhangke/rgrg-main/src/full_model/test_set_evaluation.py", line 518, in compute_intersection_and_union_area_per_region x0_max = torch.maximum(pred_boxes[..., 0], gt_boxes[..., 0]) # Error: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0 RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
Also, I tried to print the shape of pred_boxes and gt_boxes in the function
compute_intersection_and_union_area_per_region
, and I got the results as :55it [00:55, 1.18it/s] pred_boxes.size: torch.Size([4, 29, 4]) gt_boxes.size: torch.Size([4, 29, 4]) 56it [00:56, 1.19it/s] pred_boxes.size: torch.Size([3, 29, 4]) gt_boxes.size: torch.Size([4, 29, 4])
I was so confused why the batch size was not matched when it was correct with a single GPU.
Hope to hear from you soon.
The text was updated successfully, but these errors were encountered: