Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc(Transfer learning and fine-tuning) is quite different from real executive result. #66696

Open
lida2003 opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.15 For issues related to 2.15.x type:performance Performance Issue type:support Support issues

Comments

@lida2003
Copy link

lida2003 commented Apr 30, 2024

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

2.15.0+nv24.03 GPU version, check this link: https://forums.developer.nvidia.com/t/multiple-executive-warnings-after-switching-tensorflow-from-2-16-1-cpu-to-v60dp-tensorflow-2-15-0-nv24-03-gpu-version/291208

Custom code

No

OS platform and distribution

Jetson Orin Nano ubuntu 22.04 Jammy

Mobile device

No response

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA12.2.140/cuDNN8.9.4.25

GPU model and memory

sm90 8GB

Current behavior?

The executive result (trend of curve and abosolute value) is quite different from document.

图片

图片

图片

Standalone code to reproduce the issue

100% reproducable

Relevant log output

Also tried Colab, which is consistent with documentation:

图片

@tilakrayal
Copy link
Contributor

@lida2003,
Could you please confirm whether the difference in Accuracy & Loss has happened in GPU/CPU with both tensorflow v2.15, v2.16? Also I will also try to debug more on this issue and provide the resolution. Thank you!

@tilakrayal tilakrayal added TF 2.15 For issues related to 2.15.x type:performance Performance Issue labels May 2, 2024
@lida2003
Copy link
Author

lida2003 commented May 2, 2024

@lida2003,
Could you please confirm whether the difference in Accuracy & Loss has happened in GPU/CPU with both tensorflow v2.15, v2.16?

  • NVIDIA 2.15.0+nv24.03 GPU version faild 100%
  • Colab is consistent with tensorflow document (But I don't know the version)

PS: Jetson Orin Nano 8GB, CPU&GPU shared memory

v2.16

Well, on the very begining, I have installed (pip binary installation) 2.16 CPU version on Jetson Orin Nano. Runing Keras-Fine-Tuning-Pre-Trained-Models without any resource warning. It might be the way CPU using swap area.

The result is also different from the document, See link below:

When I switched to NVIDIA 2.15.0+nv24.03 GPU version: Tensorflow v2.16.1 GPU version local build on Jetson Orin Nano failed

Also I will also try to debug more on this issue and provide the resolution. Thank you!

There are also a copule of other things might be a clue for you. Here is a link on NVIDIA forum:

Please take a look at those warnings and memory issue. I think we need a sanity check before software is packed for release (put on repo).

EDIT: Keep sync with NVIDIA feedback.

@tilakrayal tilakrayal added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.15 For issues related to 2.15.x type:performance Performance Issue type:support Support issues
Projects
None yet
Development

No branches or pull requests

2 participants