Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training from scrach #24

Open
rginjapan opened this issue Apr 15, 2024 · 0 comments
Open

Training from scrach #24

rginjapan opened this issue Apr 15, 2024 · 0 comments

Comments

@rginjapan
Copy link

Training from scratch. Set "--resume" (along with "--run_id" and "--step") to resume from a checkpoint.
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3
Training from scratch. Set "--resume" (along with "--run_id" and "--step") to resume from a checkpoint.
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3

distributed_backend=nccl
All distributed processes registered. Starting with 3 processes

Logging everything in /media/sly2/860EVO/spoc-robot-training/log/YBqLm86o
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]

| Name | Type | Params

0 | model | EarlyFusionCnnTransformer | 226 M
1 | frames_metric | SumMetric | 0

226 M Trainable params
0 Non-trainable params
226 M Total params
906.443 Total estimated model params size (MB)
Sanity Checking DataLoader 0: 0%|

Then the error occurs:
File "/media/spoc-robot-training/online_evaluation/local_logging_utils.py", line 66, in version
return self._experiment.id
AttributeError: 'NoneType' object has no attribute 'id'

This is the command I used:
python -m training.offline.train_pl --dataset_version /media/spoc-robot-training/training_data/fifteen/ObjectNavType --eval_every 50 --precision 16-mixed --input_sensors raw_navigation_camera raw_manipulation_camera last_actions an_object_is_in_hand --per_gpu_batch 4 --sliding_window 50 --model EarlyFusionCnnTransformer --model_version siglip_base_3 --lr 0.0002 --wandb_logging False --output_dir /media/spoc-robot-training/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant