Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx rewrite rule in documentation is incorrect #665

Open
InspectorCaracal opened this issue Apr 9, 2023 · 13 comments
Open

Nginx rewrite rule in documentation is incorrect #665

InspectorCaracal opened this issue Apr 9, 2023 · 13 comments

Comments

@InspectorCaracal
Copy link

The currently given rule in the example config is:

location ~ ^/((config|content|vendor|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$))) {
	try_files /index.php$is_args$args =404;
}

As was mentioned before in #572 (which was closed while unresolved) this doesn't work. The rule should instead be something like this:

location ~ ^/(config|content|vendor|composer) {
	rewrite /(.*)$ /index.php?$1;
}

That will redirect any requests to those directories or their contents to instead be handled by Pico as content paths

@PhrozenByte
Copy link
Collaborator

Yeah, a simply wrong sample config might explain why we see support requests for nginx all the time 🙈 Apart from the regexes not being equivalent, can you elaborate why try_files doesn't work, but rewrite apparently does work? I don't use nginx myself, thus I can only read the docs, but I always thought that the only difference is that try_files won't pass the original URL to Pico (what is intentional actually).

@InspectorCaracal
Copy link
Author

InspectorCaracal commented Apr 15, 2023

Sure!

rewrite is basically like an internal redirect within nginx itself - it takes the request URL, changes it according to the rewrite rule, and passes it back to itself to actually process. Only the final rewritten version actually reaches the site.

try_files, as you might expect from the name, checks to see if files exists according to the list of parameters given and if so, returns them - except for the last parameter provided, which I forget the term for but is basically an internal rule. So what the root try_files $uri $uri/ /index.php$is_args$args directive does is take the uri, check if it's a path for a file, then check if it's a path for a directory, and then the last parameter is loosely equivalent to a rewrite

So what try_files /index.php$is_args$args =404 does is check if the file index.php(?stuff) exists: if so, it returns index.php as a file; if not, it uses the =404 rule and returns a server 404 page. This means anything caught by the existing rule just returns Pico's index.php page as a file (because returning it as a file parameter from the try_files directive means it doesn't go through the .php fpm)

The other issue with the existing rule is that it only catches paths that are .json .lock or .phar files, meaning that the .yml config files or .php plugin files or etc. can be just fetched directly.

So, since try_files is only used when you want to check for and potentially return files based on the uri, and we don't want to ever return any of the files in those folders but instead hand the uri to pico to handle, you want to use rewrite.

(As a disclaimer, I would definitely not call myself an expert on nginx, just a long-time user, so there might be some other technical differences that aren't visible in practice, but everything I wrote here is accurate to the best of my knowledge.)

@PhrozenByte
Copy link
Collaborator

PhrozenByte commented Apr 15, 2023

This means anything caught by the existing rule just returns Pico's index.php page as a file (because returning it as a file parameter from the try_files directive means it doesn't go through the .php fpm)

Ok, wow, this definitely explains why this doesn't work 🙈

Thank you very much for your explanation! Helped a lot to understand what is going on. ❤️

So try_files can't ever work here. I've some more questions:

  1. Does rewrite send the original REQUEST_URI as env variable to php-fpm, or only the rewritten URL? I'm asking because appending the requested page to the rewritten URL (i.e. passing it as query parameter) is very error prone, especially in combination with other query parameters (e.g. for requests like https://example.com/pico/content/my-page?var1=foo&var2=bar)? That's why we switched from QUERY_STRING based URL rewriting to REQUEST_URI with Apache. If nginx only sends the rewritten URL to php-fpm, is there a way to change that? Pico isn't any special about this, AFAIK most CMS require it this way.

  2. rewrite doesn't send a redirect to the user's browser, but rather rewrites the request internally (i.e. the user doesn't see the rewritten URL in the address bar), right?

  3. The proposed alternative doesn't match the original regex's functionality (or at least what we tried to achieve with it, even though it apparently never worked). The new config must achieve the following:

    1. Always send any request to the config/, content/, content-sample/, lib/, and vendor/ directories (content-sample/ and lib/ were missing before), as well as the CHANGELOG.md, composer.json, composer.lock, and composer.phar files (CHANGELOG.md was missing before), to Pico, no matter whether the file exists or not.
    2. Always send any request to any file or directory starting with a . to Pico, no matter whether the file exists or not.
    3. As an exception to no. 2, never send any request to the .well-known/ directory (or any .well-known/ deeper in the tree) to Pico.
    4. If a file or directory exists on the filesystem, serve it as usual (i.e. don't send the request to Pico). The request should be treated like no other rules were specified, i.e. requesting a .php file (e.g. in Pico's plugins/ directory) should indeed cause nginx to pass this request to php-fpm. In the original (broken) example config this was implemented with the location / section.
    5. Send anything else to Pico. Again, originally implemented with the location / section.

    The original regex ^/((config|content|vendor|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$))) corresponds to the first three rules; here's a visualization of the regex (using Regexper):
    Regexper

    Basically the nginx config is supposed to match Pico's default Apache config:

    Pico/.htaccess

    Lines 2 to 14 in 09aa825

    RewriteEngine On
    # May be required to access sub directories
    #RewriteBase /
    # Deny access to internal dirs and files by passing the URL to Pico
    RewriteRule ^(config|content|content-sample|lib|vendor)(/|$) index.php [L]
    RewriteRule ^(CHANGELOG\.md|composer\.(json|lock|phar))(/|$) index.php [L]
    RewriteRule (^\.|/\.)(?!well-known(/|$)) index.php [L]
    # Enable URL rewriting
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^ index.php [L]

@PhrozenByte
Copy link
Collaborator

PhrozenByte commented Apr 15, 2023

I just tasked ChatGPT with this - and it failed miserably at first giving the instructions above, even after multiple correction loops 😆 However, tasking it with adapting the Apache config for nginx at least provided something that looks promising. However, I did many tests with ChatGPT and it often creates stuff that looks promising, but is very wrong in reality. Can you take a look?

server {
  listen 80;
  server_name example.com;
  root /var/www/example.com;
 
  location / {
    index index.php;
    try_files $uri $uri/ /index.php?$args;
  }
 
  location ~ /(config|content|content-sample|lib|vendor)(/|$)|^(CHANGELOG\.md|composer\.(json|lock|phar))(/|$)|(^\.|/\.)(?!well-known(/|$)) {
    rewrite ^(.*)$ /index.php last;
  }
 
  location ~ \.php$ {
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_pass unix:/run/php/php7.2-fpm.sock;  # Replace with appropriate PHP version and path
  }
}

I don't know much about nginx, but from just comparing it with the original config and yours, it looks like $is_args is missing for try_files, and the last in rewrite might cause the request to never reach php-fpm?

@InspectorCaracal
Copy link
Author

InspectorCaracal commented Apr 15, 2023

Does rewrite send the original REQUEST_URI as env variable to php-fpm, or only the rewritten URL?

Ahhh, I missed that this is how it works! I'm a lot weaker on how Pico works than I am on how NGINX works and the docs I saw all seemed to use the path as a query parameter, so that's what I used when trying to fix it. 😅

You are right that rewrite changes the $request_uri when it runs. Preserving the original URI is possible but a little trickier, I'll get back to you on that.

Aside from that, here's a revision of what chat GPT gave you:

server {
  listen 80;
  server_name example.com;
  root /var/www/example.com;

  # this should be at the server level
  index index.php;
 
  # regex is not my strong point but i believe what you need is:
  # (^/(config|content|content-sample|lib|vendor)/) - match locations starting with (i.e. in) these directories
  # (^/(CHANGELOG\.md|composer\.(json|lock|phar))$) - match locations of these exact files
  # (/\.well-known/) - match locations containing this string anywhere
  # joined with | to match any of them
  location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {
    # last means that it won't rewrite it again - it'll still get processed by fpm. it's necessary for situations where your rewrites might cause loops.
    # so it's not necessary here but doesn't hurt
    rewrite .* /index.php$is_args$args last; 

  }

  # it doesn't matter as much here because regex locations get parsed separately
  # but it's good practice to remember not to put your root location at the top so it doesn't override all your other locations
  location / {
    try_files $uri $uri/ /index.php$is_args$args;
  }

  # i copied this from my existing config
  location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_param SCRIPT_FILENAME $request_filename;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path
  }
}

@InspectorCaracal
Copy link
Author

InspectorCaracal commented Apr 15, 2023

All right, I tested this solution for the URI and it seems to work!

server {
  # the listen/server_name/etc directives
  set $original_request_uri $request_uri;

  # ...

  location ~\.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_param SCRIPT_FILENAME $request_filename;
    fastcgi_param REQUEST_URI $original_request_uri; # use original URI
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path  
  }

@InspectorCaracal
Copy link
Author

For conciseness, here's all of it in one place:

server {
  listen 80;
  server_name example.com;
  root /var/www/example.com;
  index index.php;
	
  set $original_request_uri $request_uri;
 
  location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {
    rewrite .* /index.php$is_args$args last; 
  }

  location / {
    try_files $uri $uri/ /index.php$is_args$args;
  }

  location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_param SCRIPT_FILENAME $request_filename;
    fastcgi_param REQUEST_URI $original_request_uri;
    fastcgi_pass unix:/var/run/php/php7.4-fpm.sock; # Replace with appropriate PHP version and path
  }
}

I've updated the regex, rewrite, and URI directives in my own configuration to match this and tried them out, and it's all working correctly. Normal pages show up normally, normal files show up normally (including a root config.txt I made just for this), and attempts to access any of the blocked locations are correctly returning Pico's own 404 pages.

@PhrozenByte
Copy link
Collaborator

Thanks again @InspectorCaracal! ❤️

  location ~ (^/(config|content|content-sample|lib|vendor)/)|(^/(CHANGELOG\.md|composer\.(json|lock|phar))$)|(/\.well-known/) {

I remember that there were some issues with specially crafted requests, requiring (/|$) after any path, no matter whether it's a directory or file. For files this was something about PATH_INFO support (even though Pico doesn't use it, PHP still fiddles with it; also see below), and for directories it's necessary because one can request a directory without the trailing /, too. I guess we better be safe than sorry. Also note that the .well-known part matches the directory, even though it is supposed to not match it, but any other file and directory starting with a dot. That's why the original regex uses a negative lookahead ((?!)). I believe that the following regex should be correct, can you please give it a try?

  location ~ ^/((config|content|content-sample|lib|vendor|CHANGELOG\.md|composer\.(json|lock|phar))(/|$)|(.+/)?\.(?!well-known(/|$))) {
    rewrite .* /index.php$is_args$args last; 

I guess we can make the regex even simpler (i.e. faster) with the following. Can you please give it a try?

    rewrite ^ /index.php$is_args$args last;
    try_files $uri $uri/ /index.php$is_args$args;

I guess appending a =404 just in case is a good idea, what do you think?

    try_files $uri $uri/ /index.php$is_args$args =404;
  location ~ \.php$ {

A few more questions and notes about this section:

  • The original config includes a try_files directive. I remember that there was some weird issue regarding PATH_INFO support that caused nginx and PHP to differ in which file was requested, causing possible security issues because PHP happily interpreted files it wasn't supposed to. Can you please take a look?

        try_files $uri =404;
  • The original config includes a fastcgi_index directive. Is this necessary?

        fastcgi_index index.php;
  • Why isn't fastcgi_param SCRIPT_FILENAME part of the params snippet? Is this a special value the common config doesn't include, like our original REQUEST_URI?

  • We should let Pico know that URL rewriting is available by adding the following:

        # Let Pico know about available URL rewriting
        fastcgi_param PICO_URL_REWRITING 1;

@PhrozenByte
Copy link
Collaborator

Last but not least: Do you want to open a PR to incorporate these changes on both our website (see _docs/config.md and in-depth/nginx.md) and Pico itself (see content-sample/index.md)?

@InspectorCaracal
Copy link
Author

InspectorCaracal commented Apr 17, 2023

I remember that there were some issues with specially crafted requests, requiring (/|$) after any path, no matter whether it's a directory or file. For files this was something about PATH_INFO support (even though Pico doesn't use it, PHP still fiddles with it;

I'll have to check, but I think the location block patterns are only used within NGINX itself and wouldn't affect the server PATH_INFO? I've never encountered patterns ending with (/|$) outside of this example, though, so it's not common practice.

I guess appending a =404 just in case is a good idea, what do you think?

That would break everything. 😅 Everything but the very last parameter in a try_files directive is treated as a file path to look up, so it'll read index.php$is_args$args as a file location and return index.php as a downloadable file every time. Pico itself already handles not finding pages and returns its own 404 page, anyway, which provides a more consistent end-user experience.

As for the rest of the php-fpm block: a lot of those directives you're suspecting are needed are needed - but they're boilerplate that are included in the fastcgi snippets that come packaged with nginx. It's why I have the one line running snippets/fastcgi.conf instead. It's expected to just use that and only set the parameters you personally need, like the URI or which version of PHP to run.

(edit: ... now that I've said that, I'm not actually sure why I included the SCRIPT_FILENAME parameter manually, it's not necessary. I do a lot of "copy my config for something else and tweak it because I'm lazy" when setting up my own sites so I guess it's an artefact from someplace I did need it. 😓)

(edit 2: Oh whoops, nope, updating my full config, I rediscovered why it's there - it screwed up my subdir site. I'll poke at it more to verify if it's something specific to my config or if it should be included in this sample as well before doing the PR.)

Here is an example of what that file contains. As you can see, the tedious do-every-time work like checking if the file exists and handling the PATH_INFO is already covered. (Which I think resolves your concern about the other regex location block, too.) This file is shipped with nginx when you install it and has been part of its default installation for a long time - a decade or so, I think? So you can definitely count on it being there.

And I'll be happy to do a PR! I'll get to that in the next couple days for sure.

p.s. using ^ instead of .* in the rewrite works great 👍

@mayamcdougall
Copy link
Collaborator

Glad to see the Nginx config receiving some attention. I got curious about some things, so I went digging in the commit history. Apparently it's been SEVEN years since I wrote the initial draft for it. No wonder I can't remember anything about it anymore!

Since I'm so out-of-touch with Nginx, I don't really have much to contribute here. Just wanted to share some quick findings.

I looked at my own (very old and untouched) config. It just uses return 404 for the forbidden stuff instead of try_files. That's because at the time it was written, we weren't yet redirecting to Pico's internal 404 page.

So, I was looking through the file history and I found this commit: picocms/picocms.github.io@8ccd661, where @PhrozenByte replaced large portions of the documentation. This is where try_files came into play, though =404 wasn't added until a bit later.

So... uh... it's possible that the current config has never actually worked properly, since it sounds like, @PhrozenByte, you updated it purely by reading the docs and possibly never actually tested it? 🙈🫣

Anyway, thanks for all the help with this @InspectorCaracal. 😁

Since I wrote the original version of that page (long ago, on an enthusiastic Nginx-switching binge), I've felt pretty powerless, and a little guilty, seeing people struggle with Nginx more recently.

I've lost pretty much all the knowledge I once had on the subject though. 😅

@github-actions

This comment was marked as outdated.

@ohnonot
Copy link

ohnonot commented Jan 4, 2024

Fascinating...

I had noticed myself that the current rule does not work very well.

I have tested this snippet in a configuration that includes other directives as well, and it seems to be working perfectly (including the fastcgi block). Thanks for fixing!

Out of curiosity: If I replace rewrite .* /index.php$is_args$args last; with return 404; it seems to have the exact same result for somebody trying to retrieve a forbidden path. Or could there be some cases where it doesn't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants