Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix aloha real-world datasets #175

Merged
merged 31 commits into from
May 20, 2024

Conversation

Cadene
Copy link
Collaborator

@Cadene Cadene commented May 11, 2024

Review filters

What this does

Core:

Addition:

  • Add safetensors to git lfs
  • For every sim dataset, add an image dataset to be equivalent the original (instead of video which is compressed). Example: "lerobot/pusht" -> "lerobot/pusht_image"
  • Set CODEBASE_VERSION v1.4 (for backward compatibility)
  • Add hf-transfer as extra to download/upload faster with export HF_HUB_ENABLE_HF_TRANSFER=1
  • Fix an episode_index issue affecting all the aloha datasets. As a result:
    • the order of episodes changed in all aloha datasets (sim included)
    • so all aloha sim datasets have been reuploaded
    • but we had to reupload dataset tests artifacts as well since episode_index=0 is not the same anymore
    • but we had to reupload policy tests artifacts as well since they were using episode_index=0
  • Add *.mp4 *.json *.safetensors *.arrow as git-lfs. Enables git-lfs in CI.
  • Add velocity and effort to all aloha (real-world) datasets @haixuanTao

How it was tested

How to checkout & try?

  • Upload new/updated datasets (done on dgx):
datasets=(
  "pusht_image"
)
for dataset in "${datasets[@]}"; do
    python lerobot/scripts/push_dataset_to_hub.py \
    --data-dir /raid/remi_cadene/data \
    --dataset-id $dataset \
    --raw-format pusht_zarr \
    --community-id lerobot \
    --dry-run 1 \
    --save-to-disk 0 \
    --save-tests-to-disk 1 \
    --video 0;rm -rf /raid/remi_cadene/data/${dataset}_raw
done
datasets=(
  "xarm_lift_medium_image"
  "xarm_lift_medium_replay_image"
  "xarm_push_medium_image"
  "xarm_push_medium_replay_image"
)
for dataset in "${datasets[@]}"; do
    python lerobot/scripts/push_dataset_to_hub.py \
    --data-dir /raid/remi_cadene/data \
    --dataset-id $dataset \
    --raw-format xarm_pkl \
    --community-id lerobot \
    --dry-run 1 \
    --save-to-disk 0 \
    --save-tests-to-disk 1 \
    --video 0;rm -rf /raid/remi_cadene/data/${dataset}_raw
done
datasets=(
  "aloha_sim_insertion_human_image"
  "aloha_sim_insertion_scripted_image"
  "aloha_sim_transfer_cube_human_image"
  "aloha_sim_transfer_cube_scripted_image"
)
for dataset in "${datasets[@]}"; do
    python lerobot/scripts/push_dataset_to_hub.py \
    --data-dir /raid/remi_cadene/data \
    --dataset-id $dataset \
    --raw-format aloha_hdf5 \
    --community-id lerobot \
    --dry-run 0 \
    --save-to-disk 0 \
    --save-tests-to-disk 1 \
    --video 0;rm -rf /raid/remi_cadene/data/${dataset}_raw
done
datasets=(
  "aloha_sim_insertion_human"
  "aloha_sim_insertion_scripted"
  "aloha_sim_transfer_cube_human"
  "aloha_sim_transfer_cube_scripted"
  "aloha_mobile_cabinet"
  "aloha_mobile_chair"
  "aloha_mobile_elevator"
  "aloha_mobile_shrimp"
  "aloha_mobile_wash_pan"
  "aloha_mobile_wipe_wine"
  "aloha_static_battery"
  "aloha_static_candy"
  "aloha_static_coffee"
  "aloha_static_coffee_new"
  "aloha_static_cups_open"
  "aloha_static_fork_pick_up"
  "aloha_static_pingpong_test"
  "aloha_static_pro_pencil"
  "aloha_static_screw_driver"
  "aloha_static_tape"
  "aloha_static_thread_velcro"
  "aloha_static_towel"
  "aloha_static_vinh_cup"
  "aloha_static_vinh_cup_left"
  "aloha_static_ziploc_slide"
)
for dataset in "${datasets[@]}"; do
    export HF_HUB_ENABLE_HF_TRANSFER=1
    export HF_DATASETS_CACHE=/raid/remi_cadene/.cache/huggingface/datasets
    python lerobot/scripts/push_dataset_to_hub.py \
    --data-dir /raid/remi_cadene/data \
    --dataset-id $dataset \
    --raw-format aloha_hdf5 \
    --community-id lerobot \
    --dry-run 0 \
    --save-to-disk 0 \
    --save-tests-to-disk 1 \
    --video 1;rm -rf /raid/remi_cadene/data/${dataset}_raw
done
  • Just checkout-ed v1.4 branch on datasets that didnt need update:
from huggingface_hub import create_branch
dataset_ids = ["pusht", "xarm_lift_medium", "xarm_lift_medium_replay", "xarm_push_medium", "xarm_push_medium_replay"]
for dataset_id in dataset_ids:
    create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="v1.4")
export HF_HUB_ENABLE_HF_TRANSFER=1
datasets=(
  "pusht_image"
  "xarm_lift_medium_image"
  "xarm_lift_medium_replay_image"
  "xarm_push_medium_image"
  "xarm_push_medium_replay_image"
  "aloha_sim_insertion_human_image"
  "aloha_sim_insertion_scripted_image"
  "aloha_sim_transfer_cube_human_image"
  "aloha_sim_transfer_cube_scripted_image"
  "pusht"
  "xarm_lift_medium"
  "xarm_lift_medium_replay"
  "xarm_push_medium"
  "xarm_push_medium_replay"
  "aloha_sim_insertion_human"
  "aloha_sim_insertion_scripted"
  "aloha_sim_transfer_cube_human"
  "aloha_sim_transfer_cube_scripted"
  "aloha_mobile_cabinet"
  "aloha_mobile_chair"
  "aloha_mobile_elevator"
  "aloha_mobile_shrimp"
  "aloha_mobile_wash_pan"
  "aloha_mobile_wipe_wine"
  "aloha_static_battery"
  "aloha_static_candy"
  "aloha_static_coffee"
  "aloha_static_coffee_new"
  "aloha_static_cups_open"
  "aloha_static_fork_pick_up"
  "aloha_static_pingpong_test"
  "aloha_static_pro_pencil"
  "aloha_static_screw_driver"
  "aloha_static_tape"
  "aloha_static_thread_velcro"
  "aloha_static_towel"
  "aloha_static_vinh_cup"
  "aloha_static_vinh_cup_left"
  "aloha_static_ziploc_slide"
  "umi_cup_in_the_wild"
)
for dataset in "${datasets[@]}"; do
  python lerobot/scripts/visualize_dataset.py --repo-id lerobot/${dataset} --episode-indices 0 --serve 0
done;

@Cadene Cadene added 🐛 Bug Something isn't working 🗃️ Dataset Something dataset-related labels May 11, 2024
@Cadene Cadene self-assigned this May 11, 2024
Add hf-transfer, num_workers 8

gc collect, rmtree save_to_disk False

Track .safetensors in git lfs

run save_dataset_to_safetensors on all datasets

Update datasets

Update download_raw
@Cadene Cadene force-pushed the user/rcadene/2024_05_11_fix_aloha_static_mobile_datasets branch from f896667 to 1d9433b Compare May 17, 2024 02:11
@Cadene Cadene changed the title [WIP] Fix aloha real-world datasets Fix aloha real-world datasets May 17, 2024
@Cadene Cadene marked this pull request as ready for review May 17, 2024 02:27
Copy link
Collaborator

@AdilZouitine AdilZouitine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤗

Copy link
Collaborator

@alexander-soare alexander-soare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@Cadene Cadene merged commit 01eae09 into main May 20, 2024
7 checks passed
@Cadene Cadene deleted the user/rcadene/2024_05_11_fix_aloha_static_mobile_datasets branch May 20, 2024 11:48
HalvardBariller pushed a commit to HalvardBariller/lerobot that referenced this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working 🗃️ Dataset Something dataset-related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants