Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerfZero without Docker fails with AccessDeniedException #448

Open
lilyanatia opened this issue Dec 17, 2019 · 3 comments
Open

PerfZero without Docker fails with AccessDeniedException #448

lilyanatia opened this issue Dec 17, 2019 · 3 comments

Comments

@lilyanatia
Copy link

lilyanatia commented Dec 17, 2019

    raise Exception('"{}" failed with code:{} and stdout:\n{}'.format(
Exception: "['gsutil', '-m', 'cp', '-r', '-n', 'gs://tf-performance/auth_tokens/benchmark_upload_gce.json', '/home/hotaru/tensorflow-benchmarks/perfzero/workspace']" failed with code:1 and stdout:
AccessDeniedException: 403 hotaru@thinkindifferent.net does not have storage.objects.list access to tf-performance.
CommandException: 1 file/object could not be transferred.
@jjziets
Copy link

jjziets commented Jul 9, 2020

I also get this. Any way to fix?

@XReyRobert-IBM
Copy link

XReyRobert-IBM commented Jul 9, 2020

Same here, it looks like an issue on the bucket configuration (IAM)

@tfboyd
Copy link
Member

tfboyd commented Jul 9, 2020

I have moved on from the project but I created the tool with another person about a year ago. Lowering your expectations here for an answer but I know what is likely happening.

The tool is trying to download an authentication token, which is only used by our the TensorFlow testing/performance team and then uses that to upload results and access data.

I do not know if they changed anything but I put this in the guide to address the problem:

The key is you need to pass an empty arg for the gcloud_key_file_url that will tell it not to try and pull one down. I hope this helps. You can also find that line of code and just remove it but I am pretty sure this flag still works. I am kind of sad I moved on from the project I really enjoyed it and looked forward to a day when people moved from tf_cnn_benchmarks (also a tool I was involved with) to this tool that I very much enjoyed creating and growing. Good luck.

python3 /workspace/perfzero/lib/benchmark.py \
--git_repos="https://github.com/tensorflow/models.git" \
--python_path=models \
--gcloud_key_file_url="" \
--benchmark_methods=official.benchmark.keras_cifar_benchmark.Resnet56KerasBenchmarkSynth.benchmark_1_gpu_no_dist_strat

Keep in mind they might have moved the test, but here is the source file

This command "should" work as-is because it does not use any data. Another problem you will run into is needing to stage the data where perfzero (the test actually) can find it. It is not too hard but because I could not share the source data, e.g. imagenet, it was something I had to kind of gloss over. It is not as hard as I am making it sound. /data/imagenet I think was common or /data/cifar10.

Here is the source file for all the CIFAR10 tests: https://github.com/tensorflow/models/blob/master/official/benchmark/keras_cifar_benchmark.py

At the top you see CIFAR_DATA_DIR_NAME = 'cifar-10-batches-bin' That will get concatenated to the --root_data_dir=$ROOT_DATA_DIR arg that you pass. So something like: /data/cifar-10-batches-bin is where the data needs to be if you are not running a synthetic test. The README has some decent coverage of these args, but I want to be clear I know it would be almost impossible for you to match the error you saw with needing to pass a blank arg. I just want to let you know we spent a bit of time trying to document the args, which does not mean we were successful but I/we tried. :-)

These tests are really good because they run "normal" TensorFlow code and there is a team that is maintain them to ensure the models are 100% correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants