Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Docker Compose support with nginx reverse proxy. #60

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

drewsilcock
Copy link

Set up docker-compose.yaml to create all necessary resources to run all Streamlit apps at the same time and reverse proxy them all using nginx.

Add basic HTML file for nginx to serve that provides links to all the different apps.

This is a very simple reverse proxy integration, so there's no "back" buttons once you navigate to a particular Streamlit app.

See #58 for context.

Set up docker-compose.yaml to create all necessary resources to run all
Streamlit apps at the same time and reverse proxy them all using nginx.

Add basic HTML file for nginx to serve that provides links to all the
different apps.
@CLAassistant
Copy link

CLAassistant commented Jul 26, 2021

CLA assistant check
All committers have signed the CLA.

@koaning
Copy link
Contributor

koaning commented Jul 26, 2021

Whoa! This looks neet! I'll give it a proper review first thing tomorrow!

README.md Outdated
git clone git@github.com:RasaHQ/rasalit.git
cd rasalit

RASA_PROJECT_DIR=/path/to/my_rasa_project OVERVIEW_FOLDER=/path/to/gridresults docker-compose up -d
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified if there were a GitHub action to build all of these containers and push them up to DockerHub, in which case you wouldn't need the whole Rasalit repository and to build the containers yourself (which takes a while). In this case, you would only need to curl the docker-compose.yaml file from raw.githubusercontent.com and pull the pre-built containers from DockerHub (e.g. RasaHQ/rasalit-overview, RasaHQ/rasalit-spelling, etc.).

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

Strange. When I try to connect to localhost:8000 I get no response. After running docker compose it also seems like no port is opening up as this command gives nothing.

sudo lsof -i:8000

I think there's something happening in the nginx container.

> docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                          PORTS               NAMES
13b5c042fa00        nginx:latest                    "/docker-entrypoint.…"   3 minutes ago       Restarting (1) 53 seconds ago                       rasalit-nginx
829e6815334e        rasa/duckling                   "duckling-example-ex…"   3 minutes ago       Up 3 minutes                    8000/tcp            rasalit-duckling
95d4d5b92f7b        rasalit_rasalit-diet-explorer   "/bin/sh -c 'python …"   3 minutes ago       Up 3 minutes                                        rasalit-diet-explorer
9fb89d07e377        rasalit_rasalit-nlu-cluster     "/bin/sh -c 'python …"   3 minutes ago       Up 3 minutes                                        rasalit-nlu-cluster
> docker logs 13b5c042fa00
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/07/27 08:18:19 [emerg] 1#1: host not found in upstream "rasalit-live-nlu:8501" in /etc/nginx/conf.d/default.conf:6
nginx: [emerg] host not found in upstream "rasalit-live-nlu:8501" in /etc/nginx/conf.d/default.conf:6
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

It seems like the ngnix container is in an infinite loop of restarting. This is a few minutes later.

> docker ps                                             
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                          PORTS               NAMES
13b5c042fa00        nginx:latest                    "/docker-entrypoint.…"   6 minutes ago       Restarting (1) 53 seconds ago                       rasalit-nginx
829e6815334e        rasa/duckling                   "duckling-example-ex…"   6 minutes ago       Up 6 minutes                    8000/tcp            rasalit-duckling
95d4d5b92f7b        rasalit_rasalit-diet-explorer   "/bin/sh -c 'python …"   6 minutes ago       Up 6 minutes                                        rasalit-diet-explorer
9fb89d07e377        rasalit_rasalit-nlu-cluster     "/bin/sh -c 'python …"   6 minutes ago       Up 6 minutes                                        rasalit-nlu-cluster

@drewsilcock
Copy link
Author

The nginx container will continue restarting until all the other containers are available. In this case the rasalit-live-nlu container didn't start properly, hence nginx errors out, restarts and tries again. (Hence nginx: [emerg] host not found in upstream "rasalit-live-nlu:8501" in /etc/nginx/conf.d/default.conf:6).

If you run docker-compose logs live-nlu you should be able to see what's going wrong with that particular container.

Comment on lines 77 to 79
duckling:
image: rasa/duckling
container_name: rasalit-duckling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need this around. If you're running duckling you should already have a config.yml file that is pointing to it. Is there a reason why we need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose not everyone has duckling in their NLU pipeline - I was using the financial-demo example which does have duckling, and the duckling server needs to be available to the "spelling" and "live-nlu" apps because they use RasaNLUInterpreter (if I understand correctly).

If you've got a duckling server running elsewhere on your machine, you can tell the apps where duckling is via the RASA_DUCKLING_HTTP_URL environment variable but the duckling server would need to be visible to the containers in the Rasalit Docker Compose, which means either using an external network or just using host networking in all of the Rasalit Docker Compose containers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear things might start getting tricky here.

If the duckling URL changes then the config.yml will require a retrain before you have a model that you can pass to rasalit. That means that suddenly you may need to train a model again (which is expensive) if you want to check it out via the docker-compose trick.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the RASA_DUCKLING_HTTP_URL always took precedence if present, meaning you don't need to retrain the model if the duckling URL changes - RasaHQ/rasa#6240

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

Ah right, I think I found the issue. The current set up assumes that I ran a grid-search beforehand. If there's no existing gridsearch folder passed along, we get this:

> docker-compose logs rasalit-live-nlu
Attaching to rasalit-live-nlu
rasalit-live-nlu         | Traceback (most recent call last):
rasalit-live-nlu         |   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
rasalit-live-nlu         |     return _run_code(code, main_globals, None,
rasalit-live-nlu         |   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
rasalit-live-nlu         |     exec(code, run_globals)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/rasalit/__main__.py", line 90, in <module>
rasalit-live-nlu         |     app()
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
rasalit-live-nlu         |     return get_command(self)(*args, **kwargs)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
rasalit-live-nlu         |     return self.main(*args, **kwargs)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
rasalit-live-nlu         |     rv = self.invoke(ctx)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
rasalit-live-nlu         |     return _process_result(sub_ctx.command.invoke(sub_ctx))
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
rasalit-live-nlu         |     return ctx.invoke(self.callback, **ctx.params)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
rasalit-live-nlu         |     return callback(*args, **kwargs)
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
rasalit-live-nlu         |     return callback(**use_params)  # type: ignore
rasalit-live-nlu         |   File "/opt/venv/lib/python3.8/site-packages/rasalit/__main__.py", line 49, in live_nlu
rasalit-live-nlu         |     raise ValueError(f"You need to pass a folder that exists, got: {model_folder}")
rasalit-live-nlu         | ValueError: You need to pass a folder that exists, got: /home/appuser/models

@drewsilcock
Copy link
Author

It looks to me like it's searching for the built Rasa models from the models/ folder of the Rasa project directory. Does the Rasa project you specify with RASA_PROJECT_DIR have pre-built models in it in the models/ folder?

The gridresults will also be an issue, but that should only be an issue for the overview app.

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

Does the Rasa project you specify with RASA_PROJECT_DIR have pre-built models in it in the models/ folder?

That is the base assumption yes. When you train a Rasa model locally it saves a .tar.gz file in the models folder. You could save it elsewhere, but rasalit assumes it is in the models folder.

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

Part of me is thinking "maybe it's good to take a brief moment to take a step back". I never designed these streamlit apps to be bundled together in a single service. I imagined they'd be running as one-off inspection moments locally. I really like the direction that this PR is considering, but it does occur to me that it'd be good to ponder about/double-check the edge cases.

One idea that I'm contemplating (feel free to shoot at the idea): would it make sense to have people configure their own docker-compose.yml/Dockerfile files? We could supply a large base configuration that has most of the tools, but which users are supposed to adapt to their situation. It feels like passing that responsibility to the end-user would immediately remove the worry for things like duckling and the availability of a gridsearch folder. It might also remove the need for environment variables that the user would need to pass in.

I'm mainly mentioning this because of maintainability. I maintain about 10+ small open-source projects and I've always found that by passing responsibility to the end-user, you typically empower them which results in less issues.

@drewsilcock
Copy link
Author

Yeah that makes sense, I can't think of why they would need to change the Dockerfile but the docker-compose.yaml would be presented as "here's an example, configure it according to your needs".

@koaning
Copy link
Contributor

koaning commented Jul 27, 2021

Yeah, you're right. Any environment variables should be passed through docker-compose.yml.

Given this situation there now seem to be three options with regards to implementation.

  1. We could keep everything as is, but this would require the users to do a git clone, which feels suboptimal as it downloads the entire project locally.
  2. We could have a separate repository that hosts the Docker related files. This also feels suboptimal because this makes maintenance more tricky.
  3. We could add a generate-compose command to the CLI that generates the "starting point" files for the user. It could also generate a readme that explains how to customise it. We can choose to package the files with the project or to host them statically on Github. This feels solid in the test-able/maintain-able department and I think it's also the simplest for the end user.

I'm still very much in the "thinking out loud-mode" so feel free to challenge me on this idea. Do you have any concerns for option 3?

Option 3 would involve adding a command to the CLI written with typer. Should you be unfamiliar, you may appreciate a tutorial that I wrote on calmcode.

@drewsilcock
Copy link
Author

Sorry for the delay, I've not been able to implement this yet but option 3 sounds like the right choice to me, and shouldn't be too difficult to do. I'll try to find some time at some point to refactor this and add the Typer command to generate the starter compose YAML.

@drewsilcock
Copy link
Author

drewsilcock commented Oct 21, 2021

I took a stab at adding a command to generate the docker-compose.yaml - it seems to work but the problem is that there's no publicly available pre-built Docker images for the separate apps.

The solution would be to add a GitHub Actions workflow to generate the Docker images and publish them to GitHub Container Registry. I don't think this would be too difficult but not sure whether you think that would be better put in another PR.

@koaning
Copy link
Contributor

koaning commented Oct 21, 2021

I'm awaiting the release of Rasa 3.0 before adding features to this repo. The main reason is that I want to be 100% sure that the apps themselves are at least compatible with the most recent Rasa version.

@drewsilcock
Copy link
Author

Fair enough, I'll put up a PR for a simple GH action to publish the images but understand this PR or the other one won't be merged until a later date - just thought I may as well do it while it's fresh in my mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants