Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] Rebase onto rocker:r2u #5021

Open
D3SL opened this issue May 2, 2024 · 1 comment
Open

[Docker] Rebase onto rocker:r2u #5021

D3SL opened this issue May 2, 2024 · 1 comment

Comments

@D3SL
Copy link

D3SL commented May 2, 2024

Is your feature request related to a problem? Please describe.
By default installing R packages in linux requires building them from source. Depending on the number of packages this can take anywhere from an hour on up, and since docker containers are stateless and ephemeral that process needs to be repeated every time the container is recreated.

Describe the solution you'd like
Rebasing onto ubuntu would allow using Michael Rutter's incredible cran2deb4ubuntu repository and the r2u package, enabling users to install precompiled binaries in seconds instead of hours.

Describe alternatives you've considered
DIYing a custom docker image.

Additional context
R's strength is overwhelmingly in its enormous library of community packages. While it would increase the initial image size I would actually suggest installing the tidyverse, data.table, and collapse packages by default. Collapse and data.table in particular are incredibly powerful and efficient libraries, easily rivaling or beating Python's polars. Including those by default would make it possible to preload powerful R templates.

@D3SL
Copy link
Author

D3SL commented May 3, 2024

I made an attempt at this and actually got it working faster than expected. The result image is rather large at 4gb. Slimtoolkit took that down to <1gb but broke some of Mage's functionality in the process, so the idea is valid but someone who knows how to use slimtoolkit better than me would have to figure out how to optimize it.

FROM rocker/r2u:jammy
ENV TZ UTC

## System Packages

RUN apt-get update && apt-get install -y \
ca-certificates \
curl \
gnupg \
netbase \
sq \
wget \
gnupg \
git \
mercurial \
openssh-client \
subversion \
procps \
autoconf \
automake \
bzip2 \
default-libmysqlclient-dev \
dpkg-dev \
file \
g++ \
gcc \
imagemagick \
libbz2-dev \
libc6-dev \
libcurl4-openssl-dev \
libdb-dev \
libevent-dev \
libffi-dev \
libgdbm-dev \
libglib2.0-dev \
libgmp-dev \
libjpeg-dev \
libkrb5-dev \
liblzma-dev \
libmagickcore-dev \
libmagickwand-dev \
libmaxminddb-dev \
libncurses5-dev \
libncursesw5-dev \
libpng-dev \
libpq-dev \
libreadline-dev \
libsqlite3-dev \
libssl-dev \
libtool \
libwebp-dev \
libxml2-dev \
libxslt-dev \
libyaml-dev \
make \
patch \
unzip \
xz-utils \
zlib1g-dev \
libbluetooth-dev \
tk-dev \
uuid-dev \
g++ \
unixodbc-dev \
python3-dev \
python3-distutils \
python3-pip \
python3-apt \
libkrb5-dev \
krb5-config \
gcc 

RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
  RUN curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
  RUN apt-get update
  RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated msodbcsql18
  RUN ACCEPT_EULA=Y apt-get install -y --allow-unauthenticated mssql-tools18
  RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
  RUN echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
  
  RUN echo '[openssl_configuration]'>> /etc/ssl/openssl.cnf
  RUN echo 'ssl_conf = ssl_configuration' >> /etc/ssl/openssl.cnf
  RUN echo '[ssl_configuration] '>> /etc/ssl/openssl.cnf
  RUN echo 'system_default = tls_system_default '>> /etc/ssl/openssl.cnf
  RUN echo '[tls_system_default] '>> /etc/ssl/openssl.cnf
  RUN echo 'CipherString = DEFAULT:@SECLEVEL=1' >> /etc/ssl/openssl.cnf
  
  RUN echo 'openssl_conf = openssl_configuration' | cat - /etc/ssl/openssl.cnf > temp && mv temp /etc/ssl/openssl.cnf
  

# mageAI
ARG FEATURE_BRANCH
USER root

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

## Python Packages
RUN \
  pip3 install --no-cache-dir sparkmagic && \
  mkdir ~/.sparkmagic && \
  curl https://raw.githubusercontent.com/jupyter-incubator/sparkmagic/master/sparkmagic/example_config.json > ~/.sparkmagic/config.json && \
  sed -i 's/localhost:8998/host.docker.internal:9999/g' ~/.sparkmagic/config.json && \
  jupyter-kernelspec install --user "$(pip3 show sparkmagic | grep Location | cut -d' ' -f2)/sparkmagic/kernels/pysparkkernel"
# Mage integrations and other related packages
RUN \
  pip3 install --no-cache-dir "git+https://github.com/wbond/oscrypto.git@d5f3437ed24257895ae1edd9e503cfb352e635a8" && \
  pip3 install --no-cache-dir "git+https://github.com/dremio-hub/arrow-flight-client-examples.git#egg=dremio-flight&subdirectory=python/dremio-flight" && \
  pip3 install --no-cache-dir "git+https://github.com/mage-ai/singer-python.git#egg=singer-python" && \
  pip3 install --no-cache-dir "git+https://github.com/mage-ai/dbt-mysql.git#egg=dbt-mysql" && \
  pip3 install --no-cache-dir "git+https://github.com/mage-ai/dbt-synapse.git#egg=dbt-synapse" && \
  pip3 install --no-cache-dir "git+https://github.com/mage-ai/sqlglot#egg=sqlglot" && \
  if [ -z "$FEATURE_BRANCH" ] || [ "$FEATURE_BRANCH" = "null" ]; then \
    pip3 install --no-cache-dir "git+https://github.com/mage-ai/mage-ai.git#egg=mage-integrations&subdirectory=mage_integrations"; \
  else \
    pip3 install --no-cache-dir "git+https://github.com/mage-ai/mage-ai.git@$FEATURE_BRANCH#egg=mage-integrations&subdirectory=mage_integrations"; \
  fi

# Mage
COPY ./mage_ai/server/constants.py /tmp/constants.py
RUN if [ -z "$FEATURE_BRANCH" ] || [ "$FEATURE_BRANCH" = "null" ] ; then \
      tag=$(tail -n 1 /tmp/constants.py) && \
      VERSION=$(echo "$tag" | tr -d "'") && \
      pip3 install --no-cache-dir "mage-ai[all]==$VERSION"; \
    else \
      pip3 install --no-cache-dir "git+https://github.com/mage-ai/mage-ai.git@$FEATURE_BRANCH#egg=mage-ai[all]"; \
    fi

# R packages

RUN apt-get update && apt-get install -y  --no-install-recommends \
r-cran-rjava \
r-cran-shiny \
r-cran-glue \
r-cran-httr \
r-cran-jsonlite \
r-cran-data.table \
r-cran-flextable \
r-cran-officer \
r-cran-lubridate \
r-cran-rsqlite \
r-cran-mongolite \
r-cran-tidyverse \
r-cran-future \
r-cran-furrr \
r-cran-promises \
r-cran-dt \
r-cran-ggfittext \
r-cran-odbc \
r-cran-dbi \
r-cran-pool \
r-cran-devtools \
r-cran-scales \
r-cran-quantreg \
r-cran-rmariadb \
r-cran-plotly \
r-cran-countrycode \
r-cran-here \
r-cran-callr \
r-cran-processx \
r-cran-padr \
&& rm -rf /var/lib/apt/lists/*
RUN R CMD javareconf
## R Packages
RUN R -e "install.packages(c('renv','pacman','Rcpp', 'keyring','future.callr','assertr','data.validator','fedmatch','collapse','gt','knitr','kableExtra','formattable'))"



## Startup Script
COPY --chmod=+x ./scripts/install_other_dependencies.py ./scripts/run_app.sh /app/

ENV MAGE_DATA_DIR="/home/src/mage_data"
ENV PYTHONPATH="${PYTHONPATH}:/home/src"
WORKDIR /home/src
EXPOSE 6789
EXPOSE 7789

CMD ["/bin/sh", "-c", "/app/run_app.sh"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant