Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow formatting of pretty URL with trailing slash #679

Open
Krinkle opened this issue Feb 18, 2023 · 8 comments · May be fixed by #692
Open

Allow formatting of pretty URL with trailing slash #679

Krinkle opened this issue Feb 18, 2023 · 8 comments · May be fixed by #692

Comments

@Krinkle
Copy link

Krinkle commented Feb 18, 2023

When using Apache or Nginx to serve a static site having pretty URLs, the default is to serve directories as directories and thus have a trailing slash (with redirect enforcement in place). This seems to be how many other static generators do it as well, e.g. Jekyll.

The issue I'm running into is that when the site is published, the links generated by getUrl() for anchor links and for <link rel="canonical"> in my theme are effectively redirects. This makes the site slower due to additional redirects, and can also make machine metadata harder to correlate due to URLs that are strictly speaking not identical.

I've tried a number of different things. For example:

slug: "episode-3-paul-irish"
path: "/2009/12/04/{slug}/"

However, the slash seems to be hardcoded and stripped away at the end regardless.

I'm willing to submit a patch if it is a welcomed addition. Let me know roughly what approach you'd like me to take in that case.

Related issues: #170

@ZachS
Copy link

ZachS commented Apr 20, 2023

For anyone encountering this issue, I have used the below nginx config to serve all pages without the trailing slash.

    # Strip trailing slash.
    rewrite ^/(.*)/$ /$1 permanent; 

    location / {
        # Make directories resolve to index.html without trailing slash.
        try_files $uri $uri/index.html $uri/ =404;
    }

Without this, every internal link was causing a redirect from no-slash to slash. Not good for SEO.

@bakerkretzmar
Copy link
Collaborator

Do you specifically need all your pages served with the trailing slash, or can you configure your server to strip it so all your URLs don't have it?

@Krinkle
Copy link
Author

Krinkle commented Jun 30, 2023

I need it served with a trailing slash as the intention is to preserve and match the URL structures of other sites part of the project. This would also avoid infinite redirects or vendor lock-in when e.g. migrating to another engine in the future.

@bakerkretzmar bakerkretzmar linked a pull request Jul 28, 2023 that will close this issue
@tomcreasey
Copy link

Is there any development on this?

My current issue is that I am trying to use Bunny CDN and their Edge Storage to deploy a static website. It is blisteringly fast and perfect for what I need, however.

Now I am reaching a point to start pushing the site, I have come across an issue with URL consistency.

At the moment URLs such as:

  • domain.com/kb
  • domain.com/kb/
  • domain.com/kb/index.html

The above are all accessible, and they all return the same page. However, Google will treat all of these as a separate page creating duplicate content.

It would be great if there is a setting/option with Jigsaw to ensure URLs are consistent rather than relying on the host location to achieve this.

It is all easily fixed with Nginx and Apache but when using an Edge Storage and CDN, especially with Bunny, it isn't as easy to force the URLs to be consistent.

I am sure there are other storage solutions from AWS etc. that all handle the URL issue but I might not want to use their offering.

@damiani
Copy link
Contributor

damiani commented May 17, 2024

Thanks for resurfacing this ... I'll take a look at #692 and see if that addresses your need!

@bakerkretzmar
Copy link
Collaborator

@tomcreasey the fact that all three of those URLs are accessible is totally out of Jigsaw's control, it varies host to host and I guess Bunny decides to serve them all.

This issue and #692 are more about letting you control how Jigsaw generates internal links, so that you can control which URL format you use inside your content. Adding a <link rel="canonical"> tag should avoid Google and other crawlers treating those pages as separate, so the fact that they all exist isn't really an issue for SEO.

@damiani
Copy link
Contributor

damiani commented May 17, 2024

@bakerkretzmar is right, but at least #692 can make internal links behave consistently — to prevent redirects, and for consistency when getUrl() or getPath() are used to build canonical links. (Seems like we should add getPath() to that PR, too?)

@bakerkretzmar
Copy link
Collaborator

Oh yeah you're probably right, not sure why I avoided doing that in the PR. Might have thought that it had to do with file paths, which I didn't want to change, but it looks like maybe it's always URLs? I don't remember 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants