Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Dockerfile to build an image #140

Closed
wants to merge 18 commits into from
Closed

Add a Dockerfile to build an image #140

wants to merge 18 commits into from

Conversation

egli
Copy link
Member

@egli egli commented Sep 27, 2017

Here's a PR that adds a Dockerfile that

  1. builds the pipeline using a standard maven image
  2. installs the build artifacts from the build in a second (standard openjdk) image and runs the pipeline in the foreground.

Build and run the image as follows (you need the newest docker for the multistage build):

$ docker build -t daisyorg/pipeline2 .
$ docker run -d -p 8181:8181 daisyorg/pipeline2

You should now be able to connect to the pipeline using a client. However, at the moment this probably doesn't work as the pipeline by default is set up to work locally. This needs to be changed either in the config file in the original source or by changing the config file in the docker image.

@bertfrees
Copy link
Member

bertfrees commented Sep 27, 2017

Looks good. Before I merge this

  • I would like to add a little test

  • the configuration issue that you mention should be solved.

    We can do this either by supporting a Maven property to set it at build time, like so: mvn clean package -Dpipeline.ws.localfs=false (does not work yet). Or what should also work is adding a remote argument to your ENTRYPOINT.

@josteinaj
Copy link
Member

josteinaj commented Sep 28, 2017

Another possibility for configuring the DP2 engine is to use sed (as in the snaekobbi/system Dockerfile):

So, for instance:

# Bind engine to 0.0.0.0 instead of localhost
RUN sed -i 's/org.daisy.pipeline.ws.host=.*/org.daisy.pipeline.ws.host=0.0.0.0/' /opt/daisy-pipeline2/etc/system.properties

# Enable calabash debugging
RUN sed -i 's/\(com.xmlcalabash.*\)INFO/\1DEBUG/' /opt/daisy-pipeline2/etc/config-logback.xml

# Enable 4 concurrent jobs
RUN sed -i 's/.*\(org.daisy.pipeline.procs\)=[0-9]*/\1=4/' /opt/daisy-pipeline2/etc/system.properties

For remote mode, the remote argument in ENTRYPOINT sounds cleaner though.

@bertfrees
Copy link
Member

The sed way is a good alternative for when you don't have the source files.

@egli
Copy link
Member Author

egli commented Sep 28, 2017

I'm adding a remote argument to the ENTRYPOINT, so the pipeline is now started in remote mode. However it now complains about

	bind => [1 parameter bound]   @o.d.p.p.logging.Slf4jSessionLogger:108#log
ERROR  [o.d.p.webservice.impl.PipelineWebService] 

************************************************************
WS mode authenticated but the client store is empty, exiting
please provide values for the following properties in etc/system.properties: 
-org.daisy.pipeline.ws.authentication.key    
-org.daisy.pipeline.ws.authentication.secret 
************************************************************

i.e. neither the key nor the secret is specified. I'm not sure how this should be handled.

@egli
Copy link
Member Author

egli commented Sep 28, 2017

@bertfrees what kind of test did you have in mind. How can you test a docker image, other than running it manually?

@egli
Copy link
Member Author

egli commented Sep 28, 2017

@josteinaj I'm not so sure if I like the idea of seding around in the config files. Can't we just set them up so that they just work?

@bertfrees
Copy link
Member

bertfrees commented Sep 28, 2017

Right, I forgot about that. If you specify remote it also enables authentication. Can you try with the environment variable PIPELINE2_LOCAL=false? Otherwise we'll have to edit the system.properties file. You don't need to sed around for that, it's preferable to filter the source file with Maven properties.

Regarding the test, I was thinking about just a shell script that starts the container and connects to it with the CLI, and possibly runs a sample file through it. Basically I was gonna copy this: https://github.com/bertfrees/benetech-docker-pipeline2/blob/test/Makefile.

P.S.: read: http://daisy.github.io/pipeline/Get-Help/User-Guide/Pipeline-as-Service

@josteinaj
Copy link
Member

@egli instead of sed you could create a ./docker-resources/ directory in the repository (or maybe just docker) and put a custom system.properties etc. in there. Then COPY (or ADD, I always confuse those two) it into the image.

@egli
Copy link
Member Author

egli commented Sep 29, 2017

I believe the configuration issue is now solved via environment variables.

Re the test I'll look at the example you provided on Monday

@bertfrees
Copy link
Member

Good job. The only thing I wasn't quite happy with at first were the new environment variables. But after talking to you this morning I see a valid use case after all.

  • Basically I guess I'm just not happy with the current state of configuration in general. There are sometimes three, or even four, different ways to configure something: Java system properties, environment variables, script arguments, and for Debian there is also the special "REMOTE" variable in /etc/default/daisy-pipeline2 (no idea why I did that). If anything I'd like to remove ways to configure something, not add more.
  • You changed the bash script only, but the Windows batch file should ideally support the same environment variables.
  • Instead of adding more environment variables in an ad-hoc way whenever someone requests them I'd rather have a uniform way to map environment variables to Java system properties. The uniformity makes things more understandable and it can also make the shell scripts less verbose. Note that there is also the possibility to completely eliminate the Java system properties (except the ones belonging to external libs).
  • "Simplifying the configuration with Docker" alone is not really a valid reason for adding more environment variables I think. It's perfectly possible to use the system.properties file with Docker. However, the org.daisy.pipeline.ws.authentication.key and org.daisy.pipeline.ws.authentication.secret are in fact an exception to this. To include these in your properties file during build of the image is a bad idea, because they'll be stored as cleartext in the image and on Github, and the alternative is overwriting the properties file at run time via a volume which is clumsy.

@bertfrees bertfrees mentioned this pull request Oct 2, 2017
4 tasks
@bertfrees
Copy link
Member

bertfrees commented Oct 2, 2017

The automatic mapping from environment variables to system properties could be (took some inspiration from https://github.com/weavejester/environ):

  1. lowercase everything
  2. translate _ to .
  3. translate pipeline. to org.daisy.pipeline.

We should do this only for the variables that start with PIPELINE_ (*). The few properties that start with org.pipeline should be changed to org.daisy.pipeline...

(*): and more specifically only the ones that are actually meant for configuration; the other ones should be removed from system.properties: see also the section labeled "do not edit" in http://daisy.github.io/pipeline/wiki/Configuration-Files.

The Dockerfile uses a multistage build to first build the artifacts
using maven. Then it copies the artifacts into a final image which
exposes the port and starts the pipeline.
If PIPELINE2_AUTH_CLIENTKEY and/or PIPELINE2_AUTH_CLIENTSECRET are
defined in the environment, when starting the pipeline, use those
values. This simplyfies dockerization of the pipeline.
that is used everywhere else, for example in the default config of the
pipeline cli
so that it can be set at run time for example when starting a Docker
image and remove it from the system.properties (otherwise setting it
as an option when starting the JVM seems to have no effect)
The test starts two containers based on the same image. One for the
pipeline itself and a second one for the cli. It then starts a script
from the cli.
@bertfrees
Copy link
Member

I've added some more stuff here: daisy/pipeline@f00f389^...docker

@bertfrees
Copy link
Member

There is something fishy with setting HOST to 0.0.0.0 I think because this address also shows up in e.g. ws/scripts.

@bertfrees
Copy link
Member

For curl this does not seem to matter apparently (e.g. curl http://0.0.0.0:8181/ws/alive just works) but maybe not all tools can cope.

Also, this issue makes me wonder whether the web API should even expose the full paths. Why not just use the relative paths?

Another observation is that the CLI does not even use the href attributes, but instead generates the URLs itself based on other attributes like id. It seems that it is currently not possible to generate the download URLs for individual files, only ZIPs, but we could possibly support that too. The result would be that we wouldn't need to rely on href attributes at all. Or is that against the REST principle maybe?

@rdeltour @josteinaj Your thoughts?

@bertfrees
Copy link
Member

@egli I have another request.

Currently, if you start the pipeline2 Docker service, you have to wait a few seconds before the web service is up. There is apparently a Docker feature called "health status" that can help you with that. I haven't tried it myself because my version of Docker is not new enough, but it would look something like this in docker-compose.yml:

    healthcheck:
      # Waiting for web service to be up...
      test: ["CMD", "curl", "http://localhost:8181/ws/alive"]
      interval: 10s
      timeout: 10s
      retries: 5

End in the depending service you do:

    depends_on:
      pipeline2:
        condition: service_healthy

The healthcheck part can also be moved to the Dockerfile itself. This would be a nice addition I think. Could you look into it?

See

@egli
Copy link
Member Author

egli commented Oct 26, 2017

@bertfrees I just added a health check to the pipeline2 docker image. As for the depends_on: Unfortunately version 3 of the docker-compose format no longer supports the condition form of depends_on, so your second suggestion is no longer possible. I think the argument is that the application itself (in our case the webui) knows best when and how to reconnect.

@bertfrees
Copy link
Member

Huh? Then what is the point of the health check? :/

Did you read that argument somewhere, or is it yours?

@egli
Copy link
Member Author

egli commented Oct 26, 2017

I read something along that line deep down in a StackOverflow comment (to a solution that was still detailing the depends_on: condition solution). I think it has to do with how docker swarm handles this.

@bertfrees
Copy link
Member

I think Docker aren't communicating this very well. I could only find some explanation here and in this discussion.

What it comes down to I think is that they are moving away from docker-compose and towards a new approach in which apparently some concepts like depends_on do not fit anymore.

I'm not really satisfied by the explanation they give for this choice. I'm not convinced that an application that depends on a service always knows best when the service is ready and how long to wait for it. (But to be fair I have to say that I haven't read that much about the new approach yet.)

Anyway, a health check does not make sense if you can't use it, and as long as you are using docker-compose I think it makes perfect sense to implement the "waiting" with docker-compose. I suggest we just keep using the v2 format so that I can use the condition form of depends_on. Docker say that there's no reason to use the v3 format if you don't intend to use swarm services. Support for the version 3 format in docker-compose is only meant to help the transition.

@egli
Copy link
Member Author

egli commented Oct 28, 2017

Docker say that there's no reason to use the v3 format if you don't intend to use swarm services.

I read that too. I use docker-compose to quickly bring up a test instance, for that use case it is quite good. ATM I do not intend to use swarm services. But then again when just quickly bring up a test instance you don't really care that much about the health service. A few failed attempts of the webui to connect to the pipeline are not the end of the world.

I suggest we just keep using the v2 format so that I can use the condition form of depends_on.

Makes sense.

@bertfrees
Copy link
Member

I want to use it for other things too, like tests. With tests it's quite convenient when you can just run them and be sure the Pipeline server is running. I think It's annoying if you need to implement the waiting logic in every little test application that you write, while it can be centralized in the server, which knows best how long to wait etc. That's why I liked the idea of the health check. Until I'm convinced that the other approach is better I want to try this.

bertfrees and others added 8 commits November 2, 2017 10:01
and add it to the Makefile
…m properties

- PIPELINE2_HOME
- PIPELINE2_BASE
- PIPELINE2_DATA
- PIPELINE2_WS_LOCALFS
- PIPELINE2_WS_AUTHENTICATION

*nix only.
because you can now directly specify the Pipeline properties through
environment variables. Note that this will only work for system
properties that start with "org.daisy.pipeline" though.
Instead use the PIPELINE2_WS_LOCALFS and PIPELINE2_WS_AUTHENTICATION
environment variables directly.
@bertfrees
Copy link
Member

I have pushed my version of the branch. Depends on daisy/pipeline-framework#126.

@bertfrees
Copy link
Member

Should be merged into develop, not master!

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I'm not experienced in Docker, but having read the discussion the proposed changes makes sense to me).

Thanks @egli (and @bertfrees and @josteinaj), looks like a very useful thing to have!

bertfrees added a commit that referenced this pull request Nov 27, 2017
@bertfrees bertfrees closed this Nov 27, 2017
@bertfrees bertfrees deleted the feature/dockerimage branch November 27, 2017 10:11
@bertfrees bertfrees added this to the v1.11.0 milestone Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants