Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation issue using TPUEstimator #156

Open
nicholasbreckwoldt opened this issue Jun 3, 2020 · 2 comments
Open

Evaluation issue using TPUEstimator #156

nicholasbreckwoldt opened this issue Jun 3, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@nicholasbreckwoldt
Copy link

Running into an issue when using Adanet TPUEstimator. Say, for example, the estimator is configured with max_iteration_steps=500 and it is desired to evaluate the model's performance during training after every 100 training steps (i.e. steps_per_evaluation=100) for 2 complete Adanet iterations.

To achieve this, estimator.train(max_steps, train_input) followed by estimator.evaluate(eval_input) are run in a loop, while incrementing max_steps by steps_per_evaluation number of steps at the end of each loop, until max_steps=1000 is reached (i.e. corresponding to 2 complete Adanet iterations)

When running in local mode (i.e. use_tpu=False), training proceeds as expected. That is, training proceeds for 2 complete Adanet iterations (i.e. steps 0 to 500 for the first iteration and steps 500 to 1000 for the second iteration, with evaluation every 100 steps). However, when running on CloudTPU (i.e. use_tpu=True), training reaches max_steps=1000 without ever progressing to a second iteration.

On the other hand, a single call of estimator.train(max_steps=1000, train_input) using CloudTPU without the estimator.evaluate results in 2 complete Adanet iterations as expected. This makes me think the issue lies with the evaluation call? What could the issue be? If this is a TPUEstimator related issue, am I then constrained to the standard Estimator if I want this kind of train-evaluation loop configuration?

@cweill
Copy link
Contributor

cweill commented Jul 9, 2020

@nicholasbreckwoldt: We just released adanet=0.9.0 which includes better TPU, and TF 2 support. Please try installing it, and let us know if it resolves your issue.

@cweill cweill self-assigned this Jul 9, 2020
@cweill cweill added the bug Something isn't working label Jul 9, 2020
@nicholasbreckwoldt
Copy link
Author

@cweill Thanks for the update! I am running into a new issue with the upgrade to TF 2.2 and adanet==0.9.0 which has so far prevented me from establishing whether the above evaluation issue has been resolved. I've added a description of this new issue (#157).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants