Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20

sagravat · 2020-06-19T18:17:52Z

This contribution enables Deep Learning VM (DLVM) images from the Google Cloud AI Platform to be used as base images for execution in local, cloud, shell, and notebook modes. The notebook mode includes a Jupyterlab extension widget that allows the user to directly submit the notebook for training on the AI Platform with configurable hardware accelerators like GPUs and TPUs.

…ion.

…elp function to list the DLVM types.

…e-auth release issue.

…d for scheduler.

codecov · 2020-06-19T18:20:10Z

Codecov Report

Merging #20 into master will decrease coverage by 0.69%.
The diff coverage is 10.16%.

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
- Coverage   45.67%   44.97%   -0.70%     
==========================================
  Files          17       17              
  Lines        2728     2777      +49     
==========================================
+ Hits         1246     1249       +3     
- Misses       1482     1528      +46

Impacted Files	Coverage Δ
caliban/history/utils.py	`29.12% <0.00%> (-0.33%)`	⬇️
caliban/main.py	`22.72% <0.00%> (-1.09%)`	⬇️
caliban/docker.py	`23.16% <7.14%> (-2.08%)`	⬇️
caliban/cloud/core.py	`21.60% <20.00%> (-0.22%)`	⬇️
caliban/config.py	`59.87% <40.00%> (-0.66%)`	⬇️
caliban/gke/utils.py	`71.53% <0.00%> (-0.39%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a912775...3a417e9. Read the comment docs.

sritchie

@sagravat , this is great! I want to see if we can separate the logic here a bit more. Ideally, we would be able to reuse the existing run_interactive and run_notebook commands (I think that's the names), and make this PR into TWO changes.

The first is the --base_image support, and the second is the GCP notebook submitter.

The second is the GCP scheduler, which I think will work without the DLVM base images.

I'm having trouble parsing the new entrypoint that the second change requires. Is it really different than how we launch jupyterlab normally? If not, I wonder if we could actually enable this by making it easier for folks to install ANY jupyterlab extension.

Happy to chat more. Take a look and let me know if these notes make sense.

sritchie · 2020-06-23T15:01:39Z

caliban/cli.py

+      "--dlvm",
+      help="DLVM base image type. Must be one of  "
+           "{}".format(dlvm_types) + ". If supplied, "
+      "Caliban will skip the build and push steps and use this image tag.")


hey, I'm going to add comments as I read, so this may be clear later! Naively I would have assumed that the DLVM would be a base image, and that you could still install dependencies on top, right? If that is true, can we call this argument --base_image?

Right now, --image_id removes any requirement.txt installation; it's truly a flag to skip any build, including even getting your code into the image.

Really, the syntax should be caliban run e2a4af785bdb...

sritchie · 2020-06-23T15:03:19Z

caliban/cloud/core.py

@@ -486,7 +486,7 @@ def build_job_specs(
                    experiments=experiments)


-def generate_image_tag(project_id, docker_args, dry_run: bool = False):
+def generate_image_tag(project_id, docker_args, dlvm: str = None, dry_run: bool = False):


if we call it --base_image we can add that key to the generate_docker_args function here: https://github.com/google/caliban/blob/master/caliban/cli.py#L536

sritchie · 2020-06-23T15:05:26Z

caliban/config.py

+  return _dlvm_config(job_mode).get(dlvm_arg)
+
+
+def _dlvm_config(job_mode: JobMode) -> Dict[str, str]:


nice, this is good. I think I see now why you have --dlvm vs --base_image. What we COULD do is have a prefix for --base_image, something like --base_image dlvm:pytorch, that would force a lookup here. That would let this feature give us general base images too.

sritchie · 2020-06-23T15:06:21Z

caliban/docker.py

@@ -391,9 +391,17 @@ def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str:

  library = "jupyterlab" if lab else "jupyter"

-  return """
+  if not dlvm:
+    return """
 RUN pip install {}{}


oh, wait, good catch that we need up date this pip too! I can do that separately.

We have this --lab flag. I'm thinking that we should only do this special installation if --lab is specified. If not, even with a dlvm base image, we should just install normal jupyter. wdyt?

sritchie · 2020-06-23T15:07:17Z

caliban/docker.py

 RUN pip install {}{}
 """.format(library, version_suffix)
+  else:
+    return """
+RUN /opt/conda/bin/pip install \


does this only work on the deep learning VMs? and does it work on ALL the deep learning VMs?

If it works without the dlvms, maybe it needs its own flag.

sritchie · 2020-06-23T15:10:15Z

caliban/docker.py

@@ -511,7 +526,10 @@ def _dockerfile_template(
  if base_image_fn is None:
    base_image_fn = base_image_id

-  base_image = base_image_fn(job_mode)


Can you instead pass

base_image_fn = lambda job_mode: _dlvm_id(job_mode, dlvm)

Then we can keep to the API.

sritchie · 2020-06-23T15:13:35Z

caliban/docker.py

@@ -378,7 +378,7 @@ def _credentials_entries(user_id: int,
  return ret


-def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str:
+def _notebook_entries(lab: bool = False, version: Optional[str] = None, dlvm: bool = False) -> str:


Can we make this flag called scheduler instead? I think it doesn't depend on dlvm and COULD be a separate flag.

sritchie · 2020-06-23T15:14:18Z

caliban/docker.py

+  if inject_notebook.value != 'none':
+    install_lab = inject_notebook == NotebookInstall.lab
+    if dlvm is None:
+        dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=False)


how about remove the if/else and make the line

dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=bool(dlvm))

sritchie · 2020-06-23T15:15:17Z

caliban/docker.py

+
+  dockerfile += """
+
+USER {uid}:{gid}


nice! Now that we're on python 3.6, we can actually make this

dockerfile += f""" USER {uid}:{gid} """

using f-strings.

sagravat added 14 commits June 14, 2020 22:32

initial commit for DLVM feature for notebook, local, and cloud execut…

e63178f

…ion.

fixed issue for DLVM notebook scheduler extension installation.

ac8eb97

Updated entrypoint for interactive mode for regular and DLVM mode.

80ca15b

Implemented shell mode for DLVM.

c8e1582

Code cleanup to move functionality from build_dlvm_image to build_image

20c34ef

Fixed issue with run_interactive toggling between shell and DLVM modes.

0c96fc8

Cleanup run_interactive and scheduler extension build.

aecb17e

Set minimize to False for jupyter lab build and code cleanup.

77d0f5a

Added DLVM config map from Google Container Registry and set up the h…

e4cba55

…elp function to list the DLVM types.

Update required google api packages to specific versions due to googl…

6ba10e0

…e-auth release issue.

Sorted DLVM arg types returned to help command.

baf0aff

Cleanup and update setup.py with upstream versions.

b1cb30e

Merge remote-tracking branch 'upstream/master'. Fixed jupyterlab buil…

3abf664

…d for scheduler.

Set python to 3.6

3a417e9

sagravat changed the title ~~Added Deep Learning VM (DLVM) images as a base image option for Caliban.~~ Add Deep Learning VM (DLVM) images as a base image option for Caliban. Jun 20, 2020

sritchie requested changes Jun 23, 2020

View reviewed changes

This was referenced Jul 15, 2020

Add Schema for Calibanconfig #37

Merged

Custom base image support in calibanconfig #39

Merged

sritchie mentioned this pull request Jul 29, 2020

A way to provide my own docker image? #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20

Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20

sagravat commented Jun 19, 2020

codecov bot commented Jun 19, 2020

sritchie left a comment

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

sritchie Jun 23, 2020

		return _dlvm_config(job_mode).get(dlvm_arg)


		def _dlvm_config(job_mode: JobMode) -> Dict[str, str]:

Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20

Are you sure you want to change the base?

Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20

Conversation

sagravat commented Jun 19, 2020

codecov bot commented Jun 19, 2020

Codecov Report

sritchie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment