Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install failed #447

Open
themoonstone opened this issue Apr 17, 2024 · 0 comments
Open

install failed #447

themoonstone opened this issue Apr 17, 2024 · 0 comments

Comments

@themoonstone
Copy link

Describe the bug
fatal occured when I built a docker images with Dockerfile

To Reproduce
Steps to reproduce the behavior:

  1. the content of my Dockerfile:
COPY ../byteps ./byteps
RUN ls -alh ./byteps
ARG https_proxy
ARG http_proxy

ARG BYTEPS_BASE_PATH=/usr/local
ARG BYTEPS_PATH=$BYTEPS_BASE_PATH/byteps
ARG BYTEPS_GIT_LINK=https://github.com/bytedance/byteps
ARG BYTEPS_BRANCH=master

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
        build-essential \
        tzdata \
        ca-certificates \
        git \
        curl \
        wget \
        vim \
        cmake \
        lsb-release \
        libnuma-dev \
        ibverbs-providers \
        librdmacm-dev \
        ibverbs-utils \
        rdmacm-utils \
        libibverbs-dev \
        python3 \
        python3-dev \
        python3-pip \
        python3-setuptools \
        libnccl2=2.21.5-1+cuda12.2 \
        libnccl-dev=2.21.5-1+cuda12.2
#COPY --from=builder /etc/reslov.conf /etc/reslov.conf
# install framework
# note: for tf <= 1.14, you need gcc-4.9
RUN g++ --version
ARG FRAMEWORK=tensorflow
RUN if [ "$FRAMEWORK" = "tensorflow" ]; then \
        pip3 install --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple pip; \
        pip3 install tensorflow==2.5.0 -i https://pypi.tuna.tsinghua.edu.cn/simple; \
	pip3 install --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple setuptools; \
    elif [ "$FRAMEWORK" = "pytorch" ]; then \
        pip3 install -U numpy==1.18.1 torchvision==0.5.0 torch==1.4.0; \
    elif [ "$FRAMEWORK" = "mxnet" ]; then \
        pip3 install -U mxnet-cu100==1.5.0; \
    else \
        echo "unknown framework: $FRAMEWORK"; \
        exit 1; \
    fi
RUN ls -lh /byteps
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH
RUN cd $BYTEPS_BASE_PATH &&\
#COPY --form=builder /home/albert/tanyi4/github.com/bytedance/byteps $BYTEPS_PATH
#    git clone --recursive -b $BYTEPS_BRANCH $BYTEPS_GIT_LINK &&\
    cp /byteps ./byteps -r && \
    cd $BYTEPS_PATH &&\ 
    python3 setup.py install
  1. then built : docker build -t bytepsimage/tensorflow . -f Dockerfile --build-arg FRAMEWORK=tensorflow
  2. ** the error log is as follows: **
    Libraries have been installed in:
    Broadcast op cannot be created inside name scope #13 78.85 | ^~~~~~~~
    Broadcast op cannot be created inside name scope #13 78.88 byteps/server/server.cc: In function ‘void byteps::server::BytePSHandler(const ps::KVMeta&, const ps::KVPairs&, ps::KVServer)’:
    Broadcast op cannot be created inside name scope #13 78.88 byteps/server/server.cc:350:15: warning: unused variable ‘update’ [-Wunused-variable]
    Broadcast op cannot be created inside name scope #13 78.88 350 | auto& update = updates->merged;
    Broadcast op cannot be created inside name scope #13 78.88 | ^~~~~~
    Broadcast op cannot be created inside name scope #13 78.94 In file included from 3rdparty/ps-lite/include/ps/ps.h:13,
    Broadcast op cannot be created inside name scope #13 78.94 from byteps/server/server.h:24,
    Broadcast op cannot be created inside name scope #13 78.94 from byteps/server/server.cc:16:
    Broadcast op cannot be created inside name scope #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h: In instantiation of ‘ps::KVServer::KVServer(int, bool, int) [with Val = char]’:
    Broadcast op cannot be created inside name scope #13 78.94 byteps/server/server.cc:501:62: required from here
    Broadcast op cannot be created inside name scope #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: warning: ‘new’ of type ‘ps::Customer’ with extended alignment 64 [-Waligned-new=]
    Broadcast op cannot be created inside name scope #13 78.94 354 | this->obj_ = new Customer(
    Broadcast op cannot be created inside name scope #13 78.94 | ^~~~~~~~~~~~~
    Broadcast op cannot be created inside name scope #13 78.94 355 | app_id, app_id, std::bind(&KVServer::Process, this, 1), postoffice);
    Broadcast op cannot be created inside name scope #13 78.94 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Broadcast op cannot be created inside name scope #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: note: uses ‘void
    operator new(std::size_t)’, which does not have an alignment parameter
    Broadcast op cannot be created inside name scope #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: note: use ‘-faligned-new’ to enable C++17 over-aligned new support
    Broadcast op cannot be created inside name scope #13 82.24 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/byteps/common/common.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/compressor_registry.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/error_feedback.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/dithering.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/onebit.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/randomk.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/topk.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/vanilla_error_feedback.o build/temp.linux-x86_64-cpython-38/byteps/common/cpu_reducer.o build/temp.linux-x86_64-cpython-38/byteps/common/logging.o build/temp.linux-x86_64-cpython-38/byteps/server/server.o 3rdparty/ps-lite/build/libps.a 3rdparty/ps-lite/deps/lib/libzmq.a -L/usr/local/nccl/lib -L/usr/local/nccl/lib64 -L/usr/lib -lrdmacm -libverbs -lrt -o build/lib.linux-x86_64-cpython-38/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so -Wl,--version-script=byteps.lds -fopenmp
    Broadcast op cannot be created inside name scope #13 82.46 INFO: Unable to build TensorFlow plugin, will skip it.
    Broadcast op cannot be created inside name scope #13 82.46
    Broadcast op cannot be created inside name scope #13 82.46 Traceback (most recent call last):
    Broadcast op cannot be created inside name scope #13 82.46 File "setup.py", line 383, in check_tf_version
    Broadcast op cannot be created inside name scope #13 82.46 import tensorflow as tf
    Broadcast op cannot be created inside name scope #13 82.46 ModuleNotFoundError: No module named 'tensorflow'

    Broadcast op cannot be created inside name scope #13 82.46
    Broadcast op cannot be created inside name scope #13 82.46 During handling of the above exception, another exception occurred:

Environment (please complete the following information):

  • OS: ubuntu20.04
  • GCC version: g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
  • CUDA and NCCL version: CUDA 12.2.0 , NCCL: 2.21.5
  • Framework (TF, PyTorch, MXNet): tensorflow-2.5.0
  • pip-24.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant