Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mscoco bash instruction verification, does this look right to you? #106

Open
brando90 opened this issue Jan 6, 2023 · 3 comments
Open

Comments

@brando90
Copy link

brando90 commented Jan 6, 2023

This is my current download script. Does this look right to you?

# 1. Download the 2017 train images and annotations from http://cocodataset.org/:
#You can use gsutil to download them to mscoco/:
#cd $DATASRC/mscoco/ mkdir -p train2017
#gsutil -m rsync gs://images.cocodataset.org/train2017 train2017
#gsutil -m cp gs://images.cocodataset.org/annotations/annotations_trainval2017.zip
#unzip annotations_trainval2017.zip

# Download Otherwise, you can download train2017.zip and annotations_trainval2017.zip and extract them into mscoco/. eta ~36m.
mkdir -p $MDS_DATA_PATH/mscoco
wget http://images.cocodataset.org/zips/train2017.zip -O $MDS_DATA_PATH/mscoco/train2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip
# both zips should be there, note: downloading zip takes some time
ls $MDS_DATA_PATH/mscoco/
# Extract them into mscoco/ (interpreting that as extracting both there, also due to how th gsutil command above looks like is doing)
# takes some time, but good progress display
unzip $MDS_DATA_PATH/mscoco/train2017.zip -d $MDS_DATA_PATH/mscoco
unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
# two folders should be there, annotations and train2017 stuff
ls $MDS_DATA_PATH/mscoco/
# check jpg imgs are there
ls $MDS_DATA_PATH/mscoco/train2017
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
ls $MDS_DATA_PATH/mscoco/annotations
ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
# move them since it says so in the natural language instructions ref for moving large # files: https://stackoverflow.com/a/75034830/1601580 thanks chatgpt!
find $MDS_DATA_PATH/mscoco/train2017 -type f -print0 | xargs -0 mv -t $MDS_DATA_PATH/mscoco
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
ls $MDS_DATA_PATH/mscoco | grep -c .jpg
mv $MDS_DATA_PATH/mscoco/annotations/* $MDS_DATA_PATH/mscoco/
ls $MDS_DATA_PATH/mscoco/ | grep -c .json

# 2. Launch the conversion script:
python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
  --dataset=mscoco \
  --mscoco_data_root=$MDS_DATA_PATH/mscoco \
  --splits_root=$SPLITS \
  --records_root=$RECORDS

# 3. Expect the conversion to take about 4 hours.

# 4. Find the following outputs in $RECORDS/mscoco/:
#80 tfrecords files named [0-79].tfrecords
ls $RECORDS/mscoco/ | grep -c .tfrecords
#dataset_spec.json (see note 1)
ls $RECORDS/mscoco/dataset_spec.json
@brando90
Copy link
Author

brando90 commented Jan 6, 2023

mscoco fails despite doing all above instructions again from scratch :(

(mds_env_gpu) brando9~/data/mds $ python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
>   --dataset=mscoco \
>   --mscoco_data_root=$MDS_DATA_PATH/mscoco \
>   --splits_root=$SPLITS \
>   --records_root=$RECORDS

2023-01-06 12:00:41.907896: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64:/usr/local/cuda-11.7/lib64:
2023-01-06 12:00:41.907947: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


I0106 12:00:58.419382 139666732655232 convert_datasets_to_records.py:151] Creating MSCOCO specification and records in directory /lfs/ampere4/0/brando9/data/mds/records/mscoco...
I0106 12:00:58.419690 139666732655232 dataset_to_records.py:649] Attempting to read splits from /lfs/ampere4/0/brando9/data/mds/splits/mscoco_splits.json...
I0106 12:00:58.420802 139666732655232 dataset_to_records.py:658] Successful.
Traceback (most recent call last):
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/dataset_conversion/convert_datasets_to_records.py", line 157, in <module>
    tf.app.run(main)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/dataset_conversion/convert_datasets_to_records.py", line 153, in main
    converter.convert_dataset()
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/dataset_conversion/dataset_to_records.py", line 610, in convert_dataset
    self.create_dataset_specification_and_records()
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/dataset_conversion/dataset_to_records.py", line 1463, in create_dataset_specification_and_records
    image_crop, class_id = get_image_crop_and_class_id(annotation)
  File "/afs/cs.stanford.edu/u/brando9/diversity-for-predictive-success-of-meta-learning/meta-dataset/meta_dataset/dataset_conversion/dataset_to_records.py", line 1425, in get_image_crop_and_class_id
    image = Image.open(f)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/PIL/Image.py", line 2957, in open
    fp.seek(0)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow/python/util/deprecation.py", line 548, in new_func
    return func(*args, **kwargs)
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 137, in seek
    self._preread_check()
  File "/lfs/ampere4/0/brando9/miniconda/envs/mds_env_gpu/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 76, in _preread_check
    self._read_buf = _pywrap_file_io.BufferedInputStream(
tensorflow.python.framework.errors_impl.NotFoundError: /lfs/ampere4/0/brando9/data/mds/mscoco/train2017/000000558840.jpg; No such file or directory

@brando90
Copy link
Author

brando90 commented Jan 6, 2023

I ran

# Download Otherwise, you can download train2017.zip and annotations_trainval2017.zip and extract them into mscoco/. eta ~36m.
mkdir -p $MDS_DATA_PATH/mscoco
wget http://images.cocodataset.org/zips/train2017.zip -O $MDS_DATA_PATH/mscoco/train2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip
# both zips should be there, note: downloading zip takes some time
ls $MDS_DATA_PATH/mscoco/
# Extract them into mscoco/ (interpreting that as extracting both there, also due to how th gsutil command above looks like is doing)
# takes some time, but good progress display
unzip $MDS_DATA_PATH/mscoco/train2017.zip -d $MDS_DATA_PATH/mscoco
unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
# two folders should be there, annotations and train2017 stuff
ls $MDS_DATA_PATH/mscoco/
# check jpg imgs are there
ls $MDS_DATA_PATH/mscoco/train2017
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
# says: 118287 for a 2nd time
ls $MDS_DATA_PATH/mscoco/annotations
ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
# says: 6 for a 2nd time
# move them since it says so in the google NL instructions ref: for moving large num files https://stackoverflow.com/a/75034830/1601580 thanks chatgpt!
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
find $MDS_DATA_PATH/mscoco/train2017 -type f -print0 | xargs -0 mv -t $MDS_DATA_PATH/mscoco
ls $MDS_DATA_PATH/mscoco | grep -c .jpg
# says: 118287 for both
ls $MDS_DATA_PATH/mscoco/annotations/ | grep -c .json
mv $MDS_DATA_PATH/mscoco/annotations/* $MDS_DATA_PATH/mscoco/
ls $MDS_DATA_PATH/mscoco/ | grep -c .json
# says: 6 for both

# 2. Launch the conversion script:
python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
  --dataset=mscoco \
  --mscoco_data_root=$MDS_DATA_PATH/mscoco \
  --splits_root=$SPLITS \
  --records_root=$RECORDS

@vdumoulin
Copy link
Collaborator

Following your instructions I get the same error, but I notice that when avoiding to move *.jpg files from mscoco/train2017 to mscoco the conversion script runs successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants