Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: too many resources requested for launch #1828

Open
maaaxac opened this issue Apr 15, 2024 · 0 comments
Open

RuntimeError: CUDA error: too many resources requested for launch #1828

maaaxac opened this issue Apr 15, 2024 · 0 comments

Comments

@maaaxac
Copy link

maaaxac commented Apr 15, 2024

hi @dusty-nv , in trying to run train_ssd.py with the open images (python3 open_images_downloader.py --max-images=500 --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit)

this is the output i get, can you tell whats wrong with it? thanks in advance

python3 train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=1 --num-workers=1 --epochs=1
2024-04-15 10:05:38 - Using CUDA...
2024-04-15 10:05:38 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=1, checkpoint_folder='models/fruit', dataset_type='open_images', datasets=['data/fruit'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2024-04-15 10:06:45 - model resolution 300x300
2024-04-15 10:06:45 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2024-04-15 10:06:51 - Prepare training datasets.
2024-04-15 10:06:51 - loading annotations from: data/fruit/sub-train-annotations-bbox.csv
2024-04-15 10:06:52 - annotations loaded from: data/fruit/sub-train-annotations-bbox.csv
num images: 404
2024-04-15 10:06:54 - Dataset Summary:Number of Images: 404
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 261
Banana: 113
Grape: 136
Orange: 599
Pear: 191
Pineapple: 47
Strawberry: 550
Watermelon: 50
2024-04-15 10:06:54 - Stored labels into file models/fruit/labels.txt.
2024-04-15 10:06:54 - Train dataset size: 404
2024-04-15 10:06:54 - Prepare Validation datasets.
2024-04-15 10:06:54 - loading annotations from: data/fruit/sub-test-annotations-bbox.csv
2024-04-15 10:06:54 - annotations loaded from: data/fruit/sub-test-annotations-bbox.csv
num images: 73
2024-04-15 10:06:55 - Dataset Summary:Number of Images: 73
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 11
Banana: 9
Grape: 21
Orange: 62
Pear: 6
Pineapple: 10
Strawberry: 73
Watermelon: 11
2024-04-15 10:06:55 - Validation dataset size: 73
2024-04-15 10:06:55 - Build network.
2024-04-15 10:06:58 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
2024-04-15 10:07:01 - Took 2.97 seconds to load the model.
2024-04-15 10:07:02 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2024-04-15 10:07:02 - Uses CosineAnnealingLR scheduler.
2024-04-15 10:07:02 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_ssd.py", line 406, in
train(train_loader, net, criterion, optimizer, device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 149, in train
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant