Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

R and some R packages for similar functionality like pydatalab #1881

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renedlog
Copy link

@renedlog renedlog commented Jan 3, 2018

Hi, I have added R and some packages necessary to use the Tensorflow & Spark API with R on DataLab. (Sorry for the strange comparison, I didn't find the cause for Git to show unchanged lines as changed)

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@renedlog
Copy link
Author

renedlog commented Jan 3, 2018

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@chmeyers
Copy link
Contributor

chmeyers commented Jan 3, 2018

Pulled this locally to get a better diff. The new stuff is:
RUN apt-get update -y && apt-get install -y --no-install-recommends apt-utils
ENV LC_ALL en_US.UTF-8
And everything at the end of the file starting with:
## Use Debian unstable via pinning -- new style via APT::Default-Release

@chmeyers chmeyers self-requested a review January 3, 2018 18:07
Copy link
Contributor

@chmeyers chmeyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New GPL licensed components need to have their sources somewhere in the docker container,

libssh2-1-dev \
libcurl4-openssl-dev \
libssl-dev \
littler \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

littler is GPL licensed. It needs to be included in the list on line 46 so that we distribute the source code.

Same with r-base, r-base-dev, and r-recommended, I think.

@@ -1,252 +1,293 @@
# Copyright 2015 Google Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear on the change, but looks like all lines were modified? Can you fix so the delta is clearer?

Copy link
Contributor

@yebrahim yebrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, mainly about moving the changes into the same RUN layer for better cleanup.
Do we know how much this adds to the docker image size?

ENV PATH $PATH:/tools/node/bin:/tools/google-cloud-sdk/bin
ENV PYTHONPATH /env/python

RUN apt-get update -y && apt-get install -y --no-install-recommends apt-utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move apt-utils down into the next RUN layer to avoid a large docker image size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why is this package needed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that apt-utils is not actually needed by inspecting for example Rocker's various dockerfile compositions (here).


## Now install R and littler, and create a link for littler in /usr/local/bin
## Also set a default CRAN repo, and make sure littler knows about it too
RUN apt-get update \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step needs to be incorporated into the main install step before the cleanup, I'd move it around line 42, and then the R install commands can move next to the pip install commands for better grouping, all in the same big RUN layer. We do this so that the cleanup can happen in the same docker layer to keep the image size as small as possible.

&& ln -s /usr/share/doc/littler/examples/testInstalled.r /usr/local/bin/testInstalled.r \
&& install.r docopt \
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \
&& rm -rf /var/lib/apt/lists/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation.

R -e 'devtools::install_github("IRkernel/IRkernel")' && \
# or 'devtools::install_local("IRkernel-master.tar.gz")' && \
R -e 'IRkernel::installspec()' && \
# to register the kernel in the current R installation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already have littler installed, I would suggest using installGithub.r instead of devtools::install_github and install2.r instead of install.packages() as they fail more gracefully. At the very least as devtools is a heavy dependency use remotes::install_github() instead which is from the same source but without all the extra non-installing functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R -e 'install.packages("feather")' && \
R -e 'install.packages("tensorflow")' && \
R -e 'devtools::install_github("apache/spark@v2.2.0", subdir="R/pkg")'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line at the end of the file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author of googleAuthR/googleCloudStorageR/bigQueryR here - cool these can be used :) I would also however suggest that bigrquery is installed as well as that is the more popular R BigQuery client, having deeper integration with popular tools like the tidyverse. bigQueryR does carry features such as uploading and authentication that will be useful though, I imagine it would be best for R users to have the option of both. e.g add install.packages("bigrquery"), or if following above suggestion on using install2.r:

RUN install2.r --error \ 
    -r 'http://cran.rstudio.com' \
    googleAuthR \ 
    googleCloudStorageR \
    bigQueryR \ 
    bigrquery \
    feather \
    tensorflow 
RUN R -e 'gar_sce_auth()'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually start from Rocker/tidyverse Dockefile to build this one since most R users nowadays work vastly with these packages.

ln -s /usr/local/lib/python2.7/dist-packages/notebook/static/custom/custom.css /datalab/nbconvert/custom.css && \
cd /

## Use Debian unstable via pinning -- new style via APT::Default-Release
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the impact of this? Does this affect any subsequent apt-get installs in the container, or that the user might do? The "unstable" bit makes me wonder... is there a way to avoid this?

@brodseba
Copy link

Not sure what are the actual conflict.

@maurerbot
Copy link

Just dropping in now - there hasn't been any activity on this PR since January.

Would be nice to see R supported on GCP. What is holding this up?

@Dana-Farber
Copy link

Any news on this? Having R would be of tremendous help to analyze genomics data!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants