Skip to content

Simple jupyterhub configuration with docker-compose for a single machine

Notifications You must be signed in to change notification settings

slemonide/simple_jupyterhub

Repository files navigation

Simple Jupyterhub Deployment

This is a simple jupyterhub deployment to be used on a single server to run the hub + another server to store user backups.

Current features:

  • https/tls using nginx as a reverse proxy with automatic certificates throuhg Let's Encrypt
  • custom hub image
  • user disk quotas (1G soft + 2G hard)
  • periodic backups of user data to external server over simple ssh
  • authenticate through GitHub

Usage

To use without backups, simply do docker-compose up --build. Make sure to setup .env file first!

Goals

  • setup a simple dockerized jupyterhub installation
  • share runnable/editable code in isolated environment with your collaborators
  • teach a class using runnable/editable code in isolated environments

Disclaimer

These instructions assume you are running Ubuntu. They will likely not be much different for any other GNU/Linux.

Setup jupyterhub proxy token

Use

openssl rand -hex 32

to generate a random token for jupyterhub's proxy, and add CONFIGPROXY_AUTH_TOKEN=\<result\> to .env.

Do the same for LAUNCHER_API_TOKEN.

Setup environment

Copy .env.dist to .env, and fill in the blanks. To get OAUTH id and secret, follow instructions here: https://docs.github.com/en/free-pro-team@latest/developers/apps/creating-an-oauth-app

  • <jupyter_address> is something of the form jupyterhub.example.com
  • <backup_server_address> is something of the form backup-server.example.com
  • To generate a secure random password, you can for example, use: dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64 -w 0 | rev | cut -b 2- | rev
  • Follow instructions below to properly setup backup server.
  • Email for letsencrypt doesn't really matter, I never got any :(

Once you have your environment set up, everything should work fine.

Setup users

Copy jupyterhub/userlist.dist to jupyterhub/userlist. Delete user1, user2, user3, ... and replace by GitHub account names of people you want to give access to. Make sure to put a unique positive integer number before every user name. This number is used to setup disk quotas.

If you want to delete a user, make sure not to use their number ever again for any other user in the futre. If you do, new user would have less disk space available because some of their quotas would be used by the previous deleted user.

Alternatively, make sure to also delete data of deleted user (resides in /data/jupyterhub/users/{username}).

Setup disk quotas

To implement disk quotas, we use basic Linux filesystem disk quotas. In order for this to work, your root filesystem, and partitions where docker containers and volumes are mounted to, should be mounted with quotas on:

 sudo mount -o remount,usrquota,grpquota /

This will temporarily enable disk quotas until the next time you restart your server.

To make this permanent, edit /etc/fstab to something like:

UUID=some-long-string-of-random-symbols / ext4 errors=remount-ro,usrquota,grpquota 0 0

or

/dev/sda1 / ext4 errors=remount-ro,usrquota,grpquota 0 0

And then restart the server using sudo reboot. But first make sure you didn't make any typos since then your server wouldn't be able to boot and you wouldn't be able to ssh to it!

Next, initialize and turn quotas on:

sudo quotacheck -cum /
sudo quotaon -v /

First command initializes user quotas, and the second one turns quotas on for the disk mounte to /.

More info on disk quotas: https://linuxhint.com/disk_quota_ubuntu/

Setup database

Before being able to use database, you need to create needed tables/run migrations:

In one terminal, run:

docker-compose up database

Then, in another termial, run:

docker-compose run jupyterhub jupyterhub upgrade-db

Now you can stop the databse service in the first terminal. Database now should be functional!

Configure backups

For backups to work, you first need a second machine with a user jupyter-backup created on it. This user needs to be accessible with ssh, but does not need password (we will connect to it using ssh keys, which are more secure and easy to use):

# ssh to backup machine
localhost> ssh backup-machine
backup-machine> sudo adduser -q -gecos "" -disabled-password jupyter-backup

Now, for more advanced security, we should only allows this user to use sftp, and nothing else. I.e. we will only use them for doing backups through sftp protocol that restic can utilize.

You have to edit your sshd config on the backup machine for that:

backup-machine> sudo vim /etc/ssh/sshd_config

First, if you have Subsystem sftp /usr/lib/openssh/sftp-server, replace it by: Subsystem sftp internal-sftp. If you don't have it, just add it at the end of file. internal-sftp is just a newer version of sftp-server that works better with chroot.

Next, we need to setup a rule sot that all users sftponly group can only access the server through sftp:

Match group sftponly
     ChrootDirectory /home/%u
     X11Forwarding no
     AllowTcpForwarding no
     ForceCommand internal-sftp

This creates a chroot environment (a lighter analogue of docker, haha) for the user. It then prevents them from using X11 forwarding, TCP forwarding, and forces them to only be able to use the sftp.

Now restart your sshd server:

backup-machine> systemctl restart ssh

Next, we need to generate an ssh keypair to allow our restic container on jupyterhub machine to automatically login to backup machine through sftp and store data there.

You do it on the jupyterhub machine:

jupyterhub-machine> cd ~/simple_jupyterhub/restic/ssh/
jupyterhub-machine> ssh-keygen -f id_rsa -P ""

This generate an ssh keypair with no password protection on privite key. We don't set the password because we want restic to be able to connect to the backup server without prompting for the password.

Keep privite key secure. If someone gets access to it, they can potentially remove/steal/modify backups from the backup server.

In the directory we are currently in, there now should be two files, id_rsa (privite key), and id_rsa.pub (public key).

Do cat id_rsa.pub to see its contents. We would have to copy them to the backup server for it to recognize our jupyterhub server when it tries to connect.

Public key should look something like ssh-rsa AAA...6p username@localhost. Make sure you copy all of it.

Now, add it to the backup server:

backup-machine> sudo mkdir ~jupyter-backup/.ssh
backup-machine> sudo vim ~jupyter-backup/.ssh/authorized_keys

Setup proper permissons for ssh folder:

backup-machine> sudo chown -R jupyter-backup:jupyter-backup ~jupyter-backup/.ssh/
backup-machine> sudo chmod 600 ~jupyter-backup/.ssh/authorized_keys
backup-machine> sudo chmod 700 ~jupyter-backup/.ssh/

Note: make sure you copy the last two commands in the exact order shown since the last command makes .ssh directory unreadable by anyone except jupyter-backup.

As a check, right now you should be able to ssh as juputer-backup from your juputerhub server:

jupyterhub-machine> ssh -i id_rsa jupyter-backup@backup-machine

Now let's prevent that from happening:

backup-machine> sudo groupadd sftponly
backup-machine> sudo usermod -a -G sftponly jupyter-backup

Notice that we didn't really have to create this group before adding it to sshd config. Groups and users are superficial on unix systems. They are just numers that exist. Whe you create a user or a group, you simply assign a name to those numbers.

If you want to make it so that ssh logins work once again, do:

backup-machine> sudo gpasswd -d jupyter-backup sftponly

To test that sftp works, do:

jupyterhub-machine> echo "Hello, sftp!" > test
jupyterhub-machine> sftp -i id_rsa jupyter-backup@backup-machine
sftp> ls
sftp> lls
test
sftp> put test
Uploading test to /home/jupyter-backup/test
test                             100%   12    52.4KB/s   00:00
sftp> ls
test
sftp> lls

ls lists files on remote machine, and lls lists local files.

It now works!

Using restic to inspect backups

It's important to manually check that you are actually making backups.

To do so, install restic on any of your computers, or use it from the container: docker-compose exec restic bash.

Then, to chech snapshots: restic -r "sftp:${BACKUP_USER}@${BACKUP_SERVER}:${BACKUP_REPO_PATH}" snapshots. Snapshots are basically like commits in git, but opposite to commits, snapshots can be empty. I.e. we will always get a new snapshot every time we make a snaphsot, even if there is no new data added.

To restore a snapshot with id <snapshot-id>, use restic -r "sftp:${BACKUP_USER}@${BACKUP_SERVER}:${BACKUP_REPO_PATH}" restore <snapshot-id> --target /. It's fairly easy to restore to previous snapshot, and then get back to current version as well. Just like git.

Restic can also restore particular file/directory as opposed to rollbacking the whole system. See https://restic.readthedocs.io/en/latest/050_restore.html for more info.

For more info, read restic documentation: https://restic.readthedocs.io/en/latest/