mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. #108

brando90 · 2023-01-06T20:12:07Z

I tried running but got error:

(mds_env_gpu) brando9~/data/mds/mscoco $ gsutil -m rsync gs://images.cocodataset.org/train2017 train2017

BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist.

what to do?

full attempt:

# 1. Download the 2017 train images and annotations from http://cocodataset.org/:
#You can use gsutil to download them to mscoco/:
mkdir -p $MDS_DATA_PATH/mscoco/
cd $MDS_DATA_PATH/mscoco/
mkdir -p train2017
# seems to directly download all files, no zip file needed
gsutil -m rsync gs://images.cocodataset.org/train2017 train2017
# todo should have 118287? number of .jpg files (note no unziping needed)
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
# download & extract annotations_trainval2017.zip
gsutil -m cp gs://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
# todo says: 6?
ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json

## Download Otherwise, you can download train2017.zip and annotations_trainval2017.zip and extract them into mscoco/. eta ~36m.
#mkdir -p $MDS_DATA_PATH/mscoco
#wget http://images.cocodataset.org/zips/train2017.zip -O $MDS_DATA_PATH/mscoco/train2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip
## both zips should be there, note: downloading zip takes some time
#ls $MDS_DATA_PATH/mscoco/
## Extract them into mscoco/ (interpreting that as extracting both there, also due to how th gsutil command above looks like is doing)
## takes some time, but good progress display
#unzip $MDS_DATA_PATH/mscoco/train2017.zip -d $MDS_DATA_PATH/mscoco
#unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
## two folders should be there, annotations and train2017 stuff
#ls $MDS_DATA_PATH/mscoco/
## check jpg imgs are there
#ls $MDS_DATA_PATH/mscoco/train2017
#ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
## says: 118287 for a 2nd time
#ls $MDS_DATA_PATH/mscoco/annotations
#ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
## says: 6 for a 2nd time
## move them since it says so in the google NL instructions ref: for moving large num files https://stackoverflow.com/a/75034830/1601580 thanks chatgpt!
#ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
#find $MDS_DATA_PATH/mscoco/train2017 -type f -print0 | xargs -0 mv -t $MDS_DATA_PATH/mscoco
#ls $MDS_DATA_PATH/mscoco | grep -c .jpg
## says: 118287 for both
#ls $MDS_DATA_PATH/mscoco/annotations/ | grep -c .json
#mv $MDS_DATA_PATH/mscoco/annotations/* $MDS_DATA_PATH/mscoco/
#ls $MDS_DATA_PATH/mscoco/ | grep -c .json
## says: 6 for both

# 2. Launch the conversion script:
python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
  --dataset=mscoco \
  --mscoco_data_root=$MDS_DATA_PATH/mscoco \
  --splits_root=$SPLITS \
  --records_root=$RECORDS

# 3. Expect the conversion to take about 4 hours.

# 4. Find the following outputs in $RECORDS/mscoco/:
#80 tfrecords files named [0-79].tfrecords
ls $RECORDS/mscoco/ | grep -c .tfrecords
#dataset_spec.json (see note 1)
ls $RECORDS/mscoco/dataset_spec.json

related: brando90/pytorch-meta-dataset#20

The text was updated successfully, but these errors were encountered:

lamblin · 2023-01-20T21:17:33Z

I can confirm that the gs://images.cocodataset.org bucket does not seem to be accessible any longer, but we're not aware of an alternative source, and the original instructions at http://go/mscoco#download still mention that address.

I'd suggest you reach out to the COCO maintainers, and if there is an updated way to get that data, please let us know so we can update the instructions and scripts.

brando90 changed the title ~~mscoco url link invalid?~~ mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. Jan 6, 2023

brando90 mentioned this issue Jan 6, 2023

mscoco missing img brando90/pytorch-meta-dataset#20

Open

lamblin mentioned this issue Jan 20, 2023

tfds doesn't work to get meta-dataset data #111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. #108

mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. #108

brando90 commented Jan 6, 2023 •

edited

lamblin commented Jan 20, 2023

mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. #108

mscoco url link invalid? BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist. #108

Comments

brando90 commented Jan 6, 2023 • edited

lamblin commented Jan 20, 2023

brando90 commented Jan 6, 2023 •

edited