[IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs #129

AdamLeBlanc · 2024-04-27T02:30:53Z

Thanks for helping us improve this project!

Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml3/blob/main/INSTALL.md#update-this-project-and-its-libraries

Also please make sure to read the FAQ (https://github.com/ageron/handson-ml3#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml3/issues

Is your feature request related to a problem? Please describe.
Please indicate the notebook name and cell number where the problem occurs (or the chapter and page number in the book), and provide a clear and concise description of what the problem is. Ex. In chapter 1, cells 200-220, I think the code could be clearer [...]

Describe the solution you'd like
Include instructions for building tensorflow from source for specific GPUs in the installation guide in the GitHub repository. As well as where to find alternative builds for the ROCm variation of the package.

Describe alternatives you've considered
Using the CPU version of tensorflow for unsupported cards/drivers. This may be okay for the examples in the book, but will become a problem for any real world examples. It would be good for this repository to include instructions on building tensorflow so those with different graphics cards or drivers can get the most out of this book.

A second alternative would be to provide .whl files that target specific GPU/Driver combos in the repo. But that may be a burden to maintain.

Additional context
The current version of tensorflow is built for CUDA 11.8 and cuDNN 8 only. This is problematic for many users, as the current version of CUDA is 12.4 and cuDNN is on version 9. Tensorflow will not work properly with these drivers. It can cause problems for users who are following along on their own systems to have to downgrade their GPU drivers, as they may need the up to date drivers for other activities.

There is a similar problem for AMD users. They can also solve this problem by building tensorflow for their specific GPU/Driver. However, AMD also maintains their own distribution which AMD readers may be able to use. There alternative builds can be found here.

Tensorflow provides a solution, with instructions on how to build the project from their github repo to target your GPU/Driver combination. However, these instructions are not entirely up to date and there are some pit falls. Mainly

On linux, using clang on ubuntu based system gives an error (NvInferVersion.h missing )
- This can be solved by building with gcc instead by selected n when asked to use clang
The "build" command is wrong for the most recent version of tensorflow. You need to use the v2 build like so:
- bazel build //tensorflow/tools/pip_package/v2:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda
- This will also output the build to .../v2 inside of the directory the instructions tell you it will be output to
You need to set the following env variables

export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:/usr/local/cuda-12.4/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.4/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.4
export XLA_TARGET=cuda120
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-12.4
export TF_CUDA_VERSION='12.4'

You need to specify what version of python you want to target by setting an env variable
- export TF_PYTHON_VERSION=3.12

I believe adding these instructions would be very helpful for new readers. Especially those who are not strong programmers and are taking this more as a data science exercise, who may not be familiar with the tools needed to set this up, if they want to run the projects on their own machine.

The text was updated successfully, but these errors were encountered:

FriedrichFroebel · 2024-04-27T08:08:50Z

When these installation instructions are already part of the Tensorflow repository itself, a link might make more sense to not become outdated too quickly. If you think that there are issues inside the official Tensorflow docs, you should consider reporting them upstream and might even provide a corresponding patch/fix to let all Tensorflow users benefit from this.

AdamLeBlanc changed the title ~~[IDEA] Add instructions on how to build TF from source to support make notebooks more accessible to those who have different GPUs~~ [IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs #129

[IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs #129

AdamLeBlanc commented Apr 27, 2024

FriedrichFroebel commented Apr 27, 2024

[IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs #129

[IDEA] Add instructions on how to build TF from source to make notebooks more accessible to those who have different GPUs #129

Comments

AdamLeBlanc commented Apr 27, 2024

FriedrichFroebel commented Apr 27, 2024