Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker + reverse proxy with subdirectory #1193

Open
thehijacker opened this issue Nov 21, 2021 · 21 comments · May be fixed by #1263
Open

Docker + reverse proxy with subdirectory #1193

thehijacker opened this issue Nov 21, 2021 · 21 comments · May be fixed by #1263

Comments

@thehijacker
Copy link

Hello,

Spent way too much this on trying to figure this out. I hope someone wanted to do the same and managed to do it.

Using docker-compose file and everything is loaded and reachable at "http://192.168.28.53:7880/app". Now I wish to open access to public using nginx reverse proxy on which I also have my SSL certificate. Using following rule:

    location /docspell/
    {
        client_max_body_size 100M;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forward-Proto http;
        proxy_pass http://192.168.28.53:7880;
    }

When I try to open: https://domain.com/docspell/ I get a "Not found". Looking at logs I can see it tried to open /docspell and got 404 back. Where to change base url to include /docspell for all the URL calls? I looked in files under /var/solr but can not find anything there.

Thank you.

@eikek
Copy link
Owner

eikek commented Nov 21, 2021

Hi @thehijacker , your nginx config probably lets nginx forward the request "as is" to the upstream server. It then also receives the path /docspell, but the server doesn't know about it. This is related to the docspell restserver, as this is opened up. Solr is not affected here at all, it is only used internally.

Now, I have never tried to deploy docspell behind a path, so this might not really work. You can try this by first telling nginx to strip the path when forwarding requests (see the nginx docs for this, I remember adding a slash to the upstream server is enough, not sure though!) and then set the base-url to your public url including the path, maybe https://my.server.com/docspell.

If possible, I would recommend to use subdomains instead. The docs have an example for this (you probably already saw this).

@thehijacker
Copy link
Author

Hello @eikek. Can not use subdomain as my SSL certificate is not wildcard.

I tried with or without slash for the upstream. Did not help. I have high hopes with changing base-url. Just do not know where to set it :). As I said I am using default docker-compose.yml file. I just changed the passwords inside it.

If you can point me to the right configuration file I can change it test if then it would work.

@eikek
Copy link
Owner

eikek commented Nov 21, 2021

If you use the docker-compose file you can use environment variables, look here for possible options and also the page has some good information about how to configure docspell. You can use a config file or env variables. The env variable we need now is DOCSPELL_SERVER_BASE__URL. Setting this should take care of generating correct links in http response contents. But nginx must still strip the path; I thought if you specify a path on the upstream server (it is just / here), nginx rebases the request path. But maybe you need to do some path rewrites.

@thehijacker
Copy link
Author

I actually tried already env DOCSPELL_SERVER_BASE__URL but again it was not working with nginx.

Now I am trying something else. For base url I have put http://192.168.28.53:7880/docspell but again it is not working. I was hoping that this would change all internal URL calls to /docspell/api for example but it did not.

Looking at nginx access log I can see that using domain.com/docspell does proxy to http://192.168.28.53:7880/ but the next URL that it tries to open is domain.com/api and not domain.com/docspell/api.

Now I am not sure if this is something docspell should handle or nginx.

If this will never work I do not mind access it over VPN and with my local IP address. I actually tried this already but sending documents over Android client application fails to send as the url is http and not https? Actual error in application is:

CLEARTEXT communication to 192.168.28.53 not permitted by network security policy.

It looks like this is related to this bug:

docspell/android-client#7

Or I am doing again something wrong in configuration?

@eikek
Copy link
Owner

eikek commented Nov 21, 2021

Ah ok, I guess it then doesn't work behind a path :( sorry. I'm also not at all a nginx expert. We can have a ticket for this, but it might be a while until I can work on that. For a quick check if baseurl setting is active, you can right click and "view page source". There will be a json structure, where the baseurl should also be present. the client should actually take this url into account, but as I said, I never had a path in mind for now (so it is "officially" not supported).

The android app problem is exactly the issue you mentioned and really unfortunate. But at least there is a workaround: you can install the previous version. The new version doesn't have new features other than supporting self signed certificates (which somehow destroyed plain text connections :/).

@thehijacker
Copy link
Author

Indeed. Old 0.4.0 version works fine. It sent the image from Open Note Scanner to docspell and it added it as document. Sadly it did not do OCR on the image. And this are next steps I need to do. Figure out how to automatically adds tags based on OCR text from document and make it process (OCR) also the images files :). Time to read the documentation from start to end.

We can leave this ticket opened if you every find time to work on base-url with subfolder feature. I am comming from paperless-ng and so far liking docspell more. It has much more features, just needs more time to figure them all out.

@eikek
Copy link
Owner

eikek commented Nov 21, 2021

Sadly it did not do OCR on the image.

Oh really? This is not expected, it should definitely do OCR on the image. I just tried it here where it works :) You can open another issue with some logs if you want. In the logs there should be a tesseract command somewhere. You should also be able to see the extracted text in the ui (the menu on the attachment has a "view extracted data" entry)

If anything is not clear with the docs, don't hesitate to ask :-)

@thehijacker
Copy link
Author

Nothing in logs /data/logs with tesseract word inside. As I said. I need to start reading documentations. Doing something wrong for sure.

@gandy92
Copy link

gandy92 commented Dec 28, 2021

I also have the problem that my SSL certificate is not wildcard and with my dyndns provider not offering the required control I won't be able to change this any time soon.
I did a quick grep over the sources and noticed that while several places use the base_url to construct new hrefs, other don't. This includes setting up the Router in RestServer.scala, where the path section of the base_url could be prepended to the installed routes. Would this help in solving the issue?
I've successfully set up the build environment for docspell and although I don't have any experience in scala or elm, I'd like to give it a shot - if you're not already working on it, of course.

@eikek
Copy link
Owner

eikek commented Dec 28, 2021

@gandy92 I'm not working on this. You can go ahead, if you want! thanks! I would also start on the Router - not sure if that is the only place though. I think one other place to look for are the background tasks that send emails/notification messages. They get the base-url from the client, so it might just work actually 🪄 but maybe not 😉

@gandy92
Copy link

gandy92 commented Dec 28, 2021

@eikek thanks for your feedback! I've already managed to change RestServer,scala, AttachementRoutes.scala, ShareAttachementRoutes.scala, Flags.scala and TemplateRoutes.scala to prepend cfg.baseUrl.path.asString to each location. The restserver compiles and starts ok, but of course this is only part of the game.
I'll need some time to wrap my head around the elm code, so in order to find all relevant spots I've started hard-coding the base_path where I thought to need it (the base_path is chosen so that it is perfectly identifiable in the code so that it can easily be replaced with a variable at a later point). So far I've changed App/Update.elm, Comp/ItemCard.elm, Page.elm and also template/index.html.
Now the main page loads and looks like it should, but it keeps reloading and I get quite a lot of errors like this:

restserver level=INFO  thread=blaze-selector-0 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41858"
restserver level=INFO  thread=io-compute-2 logger=d.r.w.TemplateRoutes message="Compiled template file:/home/andy/prg/docspell/modules/restserver/target/scala-2.13/classes/index.html"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/app/login?r=/andy/app/home"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/app/assets/docspell-webapp/0.31.0-SNAPSHOT/img/logo-96.png"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"
restserver level=INFO  thread=blaze-selector-3 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41864"
restserver level=INFO  thread=blaze-selector-4 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41866"
restserver level=INFO  thread=blaze-selector-1 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41860"
restserver level=INFO  thread=blaze-selector-2 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41862"
restserver level=DEBUG thread=io-compute-2 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=DEBUG thread=io-compute-1 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=blaze-selector-0 logger=o.h.b.c.n.NIO1SocketServerGroup message="Accepted connection from /127.0.0.1:41868"
restserver level=DEBUG thread=io-compute-0 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/auth/session"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/email/settings/smtp?q="
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/clientSettings/webClient"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/calevent/check"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 POST /andy/api/v1/sec/calevent/check"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-1 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/tag?sort=name&q="
restserver level=INFO  thread=io-compute-1 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-0 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/tag?sort=name&q="
restserver level=INFO  thread=io-compute-0 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=DEBUG thread=io-compute-3 logger=d.b.auth.Login message="Invalid session token: Invalid authenticator"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/v1/sec/folder?q=&sort=name"
restserver level=INFO  thread=io-compute-3 logger=o.h.s.m.Logger message="HTTP/1.1 403 Forbidden"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 GET /andy/api/info/version"
restserver level=INFO  thread=io-compute-2 logger=o.h.s.m.Logger message="HTTP/1.1 200 OK"

I also see several complaints on the javascript console, but with the page reloading all the time it's difficult to get a clear picture.

So, way to go here and I will most probably have to pick your brain at some point. Not to raise any expectations, at the moment I'm mostly curious if I can come up with a solution that "only" requires a few optimizations on your side.

@eikek
Copy link
Owner

eikek commented Dec 29, 2021

It is not an easy change, I'm afraid - I knew this 😄 But I was hoping that it's not so many places. This is really bad. I also like the idea to hard-code all the places for now and see how to streamline it later. It should be possible with a few such modifications that you did, i think.

Re Elm: There is already a config setting for base_url at the server and this is also send to the client (I think you found it). In Elm, there is Flags.elm file that contains this base url. My thought was that if the base_url is communicated with the path to the client, there shouldn't be too much to change. It might be necessary to pass this Flags type to more places, though.

Re reloading all the time: I'm not sure why that is from immediate memory. It could be related to some requests that cannot be authenticated properly. Maybe the cookie is not picked up, because its path changed? Just a very rough guess. It seems also strange that it says "invalid authenticator", because that means that the token is send with the request (it's there), but could not be decoded. If you want you can push your code somewhere so I can check it out and run it here (when i find time).

@gandy92
Copy link

gandy92 commented Jan 1, 2022

Neither would I have expected it to be easy, especially given my lack of experience with elm and scala. However, it's probably not that may places, after all - some changes I already had to revert to not end up adding the base_path twice. Anyway, I've pushed my changes to my fork of docspell at https://github.com/gandy92/docspell. As far as I can tell, most URLs during page loading are fine, but as you already noticed the authentication stuff is utterly broken. In the webgui this leads to the login page not being shown at all (I've tested this with all cookies removed and a cleared browser cache).
I used a simple python script to test logging in over the REST API, and this works fine, including retrieval and use of the access token. So it could well be that the problem is mostly cookie related, but I couldn't find where to look further on this one.

@eikek
Copy link
Owner

eikek commented Jan 2, 2022

Awesome, thank you! I'll look into it in the next days (I hope)

@eikek
Copy link
Owner

eikek commented Jan 8, 2022

Hi @gandy92 , I tried your branch and did some changes. Now it kind of works. The reason for re-authentication was because the parser for the pages was not updated with the new basePath. It is still a mess, of course. Not sure how to streamline it right now, maybe you find something here. If you want I can push my changes to your branch if you would like to further investigate, I think you need to open a PR for me to do this. You can also get it from here.

@gandy92 gandy92 linked a pull request Jan 8, 2022 that will close this issue
@eikek
Copy link
Owner

eikek commented Jan 10, 2022

Hi @gandy92 I just pushed something your branch

@gandy92
Copy link

gandy92 commented Jan 10, 2022

Thank you @eikek I'll look into it as soon as possible. Back at the day job, but I'll find time.

@eikek
Copy link
Owner

eikek commented Jan 10, 2022

Sure, and no worries, we have no deadlines here :) Whenever you find some time.

@gerroon
Copy link

gerroon commented Mar 21, 2022

Hi

Is subdirectory now allowed behind a reverse proxy? I need to set this up with Apache but it directs to "/app" so I am not sure how to fix it.

@eikek
Copy link
Owner

eikek commented Mar 21, 2022

It's not possible to deploy behind a path, must be the root path at the moment. There were some efforts to this, but no eta.

@gerroon
Copy link

gerroon commented Mar 21, 2022

It's not possible to deploy behind a path, must be the root path at the moment. There were some efforts to this, but no eta.

Thanks, I will just try to use it internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants