Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for extensionless URLs (neither .html nor trailing slash)? #809

Open
betaveros opened this issue Sep 2, 2020 · 11 comments
Open

Comments

@betaveros
Copy link

I would like to generate a static site in which I have files like blog/post-name.html, but the generated URLs to them have neither .html nor a trailing slash and look like, for example, example.com/blog/post-name. (Although this requires support from the web server, it's pretty common and easy to set up; for example, GitHub Pages supports it directly with no configuration. It doesn't seem to be commonly supported in other static site generators, but Jekyll supports it — there used to be a clear section in the Jekyll documentation explaining this, but it seems to be sort of absorbed into the overall documentation now; see jekyll/jekyll#156 for more discussion and motivation.)

Does Lektor support this? If not, is it possible to add support, or at least expose enough API to make this possible to implement in a plugin? I've poked around trying to write a plugin and am very unsure if the current plugin API supports this. For example, I can create an extra VirtualSourceObject for every Record and make the URL resolver resolve extensionless URLs to it, and I imagine I could just change every place URLs are generated in my website templates to use these URLs instead, but I don't think I can avoid building a separate artifact at a separate location, nor do I think I can make my new source just fallback to my old source's generated file.

If this sounds like a feature you don't have but would accept, I am extremely willing to write code for this :) (Lektor's CMS looks amazing, plus my blog currently runs on a custom Hugo fork that I don't want to maintain indefinitely...)

@runfalk
Copy link
Member

runfalk commented Sep 2, 2020

If you want to generate the file post-name without any extension there are some problems. Lektor allows the user to add attachments to any page. Where would they be put? Some pages can also have children. What would URLs for a project list + projects look like?

Lektor generates a folder per page with an index.html inside it. You can generate the URLs in any form way you want by replacing the url filter with one that doesn't add trailing slashes. Then you'd need to coerce your web server to not do a redirect when no trailing slash is added (which at least nginx seems to do in my tests). Apparently there seems to be a way to disable that though: https://stackoverflow.com/questions/15555428/nginx-causes-301-redirect-if-theres-no-trailing-slash.

I think the web server configuration method will cause you a lot less pain. May I ask why you want this feature?

@betaveros
Copy link
Author

To be clear, I want to generate a file called /path/to/post-name.html, but for the generated URLs and presumably the dev server to use /path/to/post-name. I agree the URLs get a bit ugly/strange with attachments and children, but I saw that Lektor already supports dotted slugs and has a convention for where to put children, as documented here: https://www.getlektor.com/docs/content/urls/

I think if I were choosing ideal URLs for a project list and projects, I would use /projects/, /projects/project-a, /projects/project-b. The responsibility would be on me to configure Lektor such that I only use extensionless slug formats for posts without children, and I would be OK if it behaves weirdly or straight-up fails to build when I violate this rule. However, I would not mind if I had URLs /projects, /projects/project-a, /projects/project-b. (Note also that a folder called post-name actually doesn't conflict with a file called post-name.html.)

My use case is mainly deployment to GitHub Pages, which already automatically rewrites /path/to/post-name to /path/to/post-name.html behind-the-scenes, but not /path/to/post-name to /path/to/post-name/ (it does a redirect, which I don't want), and does not, as far as I am aware, allow this to be configured.

@betaveros
Copy link
Author

betaveros commented Sep 2, 2020

FWIW, I understand if you don't want this feature in core, but I would be satisfied by a plugin architecture that lets me do these two things and nothing else:

  1. change the url filter to automatically remove the suffix .html from every URL that ends in .html (and isn't for an external site)
  2. change the dev server to automatically rewrite every path in which the part after the last slash is nonempty but contains no .'s, by appending .html to it.

I wanted to see if there were better ways to do it, though.

(edit: There's also one more problem I forgot with the rewriting solution, which is that silently rewriting /path/to/post-name to /path/to/post-name/ completely breaks relative paths...)

@runfalk
Copy link
Member

runfalk commented Sep 2, 2020

So, there are a few things to unpack. Firstly You can add your own |url in an extension as part of the env setup. I'm not sure if you can override a built-in but you could do something like |href or |link. There is an option to use absolute urls over relative (https://www.getlektor.com/docs/project/file/ see url_style).

So if there is an HTML file GitHub pages will automatically "add" the .html extension? If that's the case why don't you try to override the slug of post-name to post-name.html and see if it works (I can't try myself at the moment)?

If that works it's probably easier for you to do that and write a proxy in front of your dev server that fetches the HTML for you.

If this is to be added as a core feature I think it should probably just be the post-name.html part. Then a plugin could implement an alternative URL filter that rewrites URLs if need be. The projects example would be something like.

  • /projects.html
  • /projects/attachment.jpg
  • /projects/project-a.html
  • /projects/project-b.html
  • /projects/project-b/attachment-b.jpg

I think this would be the most consistent way to do it. There may be room for a github_pages variant of url_style in core, but I'm not sure I like that. It seems a bit extreme to cater to a single vendor specific configuration just to avoid a 301 redirect. Are there any other things apart from the redirect and aesthetics?

@betaveros
Copy link
Author

OK, after reading that, I figured out how to monkey-patch Lektor from a plugin to basically work for my use case: https://gist.github.com/betaveros/fcf9e54706acc241eb16604debe4901f

So I think this will work for now. However, let me at least try to argue that this behavior is not that vendor-specific:

I'll let you decide if this issue is still worth considering, though. Thanks!

@xlotlu
Copy link
Contributor

xlotlu commented Sep 3, 2020

I don't really have an objection to this being in core, I guess file extensions are increasingly irrelevant to anyone but old dinosaurs.

The complicated stuff would be:
a) finding a config name for it 🙄, and
b) making sure the logic won't blow up once #344 is fixed. There's probably other places that will get tripped, relative urls will need fixing as well, and the devserver part will need to know if that's an actual extensionless file or a .html. The current gist is naive in this respect, but hey, so is lektor core for now.

@betaveros
Copy link
Author

A possible way to combine the behavior with #344 could be to define a new system field _slug_style (?? name TBD) that affects this invocation of _process_slug:

bits.append(_process_slug(node['_slug'], node is self))

This new field would describe how to convert the slug into a component of the URL path, given the slug and whether it's the last segment of the path. It can take one of the following values:

  • auto (default): more or less current behavior, maybe with more intelligent file extension detection. Behaves like always_file below if the slug "looks like a file name" (contains a .) and always_dir below if not.
  • always_dir: Always treat the slug as a directory name. If it's the last segment in the path, add the file name index.html.
  • always_file: Always treat the slug as a file name. If it's not the last segment in the path, split by / and prepend a _ to the last part to get a directory name that doesn't collide with treating it as a file name.
  • leaf_html (?? name TBD): If it's the last segment in the path, get a file name by appending .html. If it's not the last segment in the path, use the slug directly as a directory name.
  • If desired for backwards-compatibility, legacy: a version with the current file extension detection logic.
  • ??? maybe allow plugins to add their own slug styles?

I think it would be good if there's a project-wide setting that provides a default value for this field, and then individual pages can override it. Maybe also allow a page to set a value for its children, similar to slug_format.

I think if we can fully adopt the auto behavior above, it will also help smooth out some other parts of Lektor's path behavior. For example, it will be possible for pages with foo.html slugs to have child pages, which doesn't seem to work right now? (The dev server doesn't succeed in resolving them because the slug doesn't match, even though they're successfully built and put under the _foo.html directory.) Similarly we can also make it possible to paginate pages with foo.html slugs, as _foo.html/page/1/ and such, which explicitly fails right now. (However, I think I would also like a project-wide setting for slug_style to affect pagination and maybe cause the pages to look like _foo.html/page/1.html?)

betaveros added a commit to betaveros/lektor that referenced this issue Sep 5, 2020
Slug styles specify how slugs are turned into file and directory names
in the URL path. This is roughly based off the design in
lektor#809 (comment).

As is, this PR fixes lektor#344 and deals with the tricky "global" part of
doing lektor#809 properly. It also enables pages with dots in their slugs to
be paginated and fixes some issues with the dev server resolving
descendants of such pages incorrectly. It is still a prototype, though
(I have only done some casual manual testing and yet to write any
tests).

In addition to a sitewide default setting and per-page setting, we would
likely also want datamodels to be able to specify the slug style of
their pages and/or their child pages. To stay similar to the current way
`slug_format` can be configured, it seems we should also allow
datamodels to specify the slug styles of their children, but as
mentioend in lektor#806 that behavior is a little strange and worth
reconsidering, so this PR doesn't try to implement any such behavior.
@betaveros
Copy link
Author

Does anybody have thoughts or feedback on the above design and PR?

@xlotlu
Copy link
Contributor

xlotlu commented Sep 12, 2020

I don't see this as a core feature, that is, part of the slug logic.

To me this is simply "URL faking", something that needs explicit support from the web server. If it lands in lektor core that's all fine, but it should be labelled as such, not mingled with slugs (I think this should also apply to the code as much as possible).

I haven't reviewed the PR yet, but I will, sometime next week I hope.

@betaveros
Copy link
Author

Cool. Yeah, the PR does not actually do the URL faking part of this issue; it just gives you an option to let something with slug project render at /projects.html but with children under /projects/, and also other options to solve 344.

@testbird
Copy link

testbird commented Jan 8, 2021

For requests, isn't apache (multiviews) also able to add /index.html to an extension-less slug if it finds a directory?

So, wouldn't lektors a-folder-per-page approach (on disk) be just fine. And the only thing missing would be a lektor option that allows it to drop /index.html | .html from internal links in the generated pages?

webserver_supports_clean_URLs = true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants