Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix trailing slash issues (depend on hosting provider) #3372

Closed
slorber opened this issue Aug 31, 2020 · 34 comments · Fixed by #4908
Closed

Fix trailing slash issues (depend on hosting provider) #3372

slorber opened this issue Aug 31, 2020 · 34 comments · Fixed by #4908
Labels
proposal This issue is a proposal, usually non-trivial change

Comments

@slorber
Copy link
Collaborator

slorber commented Aug 31, 2020

Edit: as part of the analysis to solve this issue, I'm writing a guide to describe factually the behavior of various static site generators and static hosting providers: https://github.com/slorber/trailing-slash-guide

💥 Proposal

Fix trailing host slash issues: write /myDoc.html instead of /myDoc/index.html

We have a few issues that are related to docs trailing slashes: #2654, #2394, #2134, #3351, #3380

I suspect these issues are related to the Docusaurus output, as for a doc at /docs/myDoc we write /myDoc/index.html instead of /myDoc.html

Always creating a subfolder is probably a signal for the hosting providers (Netlify, Github Pages...) to add a trailing slash automatically to URLs, which can be annoying.

It can break relative links as now the link resolution is different due to the trailing slash, and we end up with broken links in production, not catched at build time (see #3380)

This behavior is provided by a dependency: https://github.com/markdalgleish/static-site-generator-webpack-plugin/blob/master/index.js#L165

We already run on a fork of this dependency. I guess we could as well include a way to write /myDoc.html directly if a route path does not end with a trailing slash.

As this is a risky change, I suggest making this optional. We can make it the default behavior (breaking change) later if it works fine for early adopters.

See also #2394

@slorber slorber added the proposal This issue is a proposal, usually non-trivial change label Aug 31, 2020
@slorber
Copy link
Collaborator Author

slorber commented Aug 31, 2020

@slorber slorber changed the title Proposal: write /myDoc.html instead of /myDoc/index.html Proposal: Fix trailing host slash issues: write /myDoc.html instead of /myDoc/index.html Sep 3, 2020
@lunelson
Copy link

Always creating a subfolder is probably a signal for the hosting providers (Netlify, Github Pages...) to add a trailing slash automatically to URLs, which can be annoying.

@slorber fwiw this is fixable on Netlify by changing your post-processing settings. They have a UI bug, where the master checkbox to disable post-processing does not disable the pretty URL setting, and that is what causes e.g. https://v2.docusaurus.io/docs/2.0.0-alpha.63/installation to 301 redirect to https://v2.docusaurus.io/docs/2.0.0-alpha.63/installation/. You have to just turn them all off individually.

image

@slorber
Copy link
Collaborator Author

slorber commented Sep 28, 2020

Thanks @lunelson , didn't know about this bug, do you have a link to track this?

Also, I tested these options on my own website, but still have a trailing slash issue 😅 https://sebastienlorber.com/records-and-tuples-for-react

@lunelson
Copy link

lunelson commented Sep 29, 2020

@slorber If you changed the options and tested your site right away, it might not have worked: the changes seem to take a few minutes to propagate. For me the URL above now works without redirecting to a trailing slash.

And no I don't know of an issue that tracks it at Netlify but I've seen it come up in discussions here and on Gatsby about trailing slash problems.

@lunelson
Copy link

@slorber Having pointed that out however, I must say that in my company's websites (including 3 Docusaurus v2 sites and potentially soon more), we will probably turn this Netlify feature back on.

This is because we have had the strong recommendation from an SEO agency that—while neither format (trailing-slash or not) is "better"—there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

This creates a challenge for me right now because currently Docusaurus generates all URL patterns in router/anchor links, sitemap and canonical link tags without trailing slashes 😬

@mpsq
Copy link
Contributor

mpsq commented Sep 29, 2020

The sitemap can be generated with trailing slashes, that's what the option trailingSlash is for: https://v2.docusaurus.io/docs/using-plugins#docusaurusplugin-sitemap.

@lunelson
Copy link

@mpsq well, that's one down at least 😅

@mpsq
Copy link
Contributor

mpsq commented Sep 29, 2020

Yes :) On your list, "URL patterns in router/anchor links" and "canonical link tags" are actually the same thing, they depend on how the permalink prop is generated. All you need is to enable/allow that prop to have a trailing slash.

EDIT:
Maybe this could be a more general setting, rather than a granular one specific to the sitemap plugin.

@lunelson
Copy link

Maybe this could be a more general setting, rather than a granular one specific to the sitemap plugin.

YES. This would be ideal.

@mpsq is there any chance you could give me an idea of where I need to intervene in my custom theme, to modify that permalink property in a way that will affect the router, and generated anchor and link tags all together?

@mpsq
Copy link
Contributor

mpsq commented Sep 29, 2020

You could do it in your theme by tweaking DocItem / Layout and replace occurrences of permalink with permalink + "/". This is super hacky though, a configuration property in core Docusaurus would be much nicer.

@lunelson
Copy link

lunelson commented Oct 1, 2020

a configuration property in core Docusaurus would be much nicer

Again: yes

@slorber
Copy link
Collaborator Author

slorber commented Oct 2, 2020

Thanks, didn't notice but it seems my site does not have trailing slashes anymore :)

But anyway we should figure out a portable solution that works even on GithubPages where you don't have many hosting options to tweak... still believe that my initial proposal might be a better solution, need to find time to test this on multiple hosts.

For workarounds to override the permalink, you may be interested by this issue: #3501 (comment)

Not sure but something like that could work to customize the permalink:

import React from 'react';
import OriginalLayout from '@theme-original/Layout';
import Head from '@docusaurus/Head';
import {useLocation} from '@docusaurus/router';

export default function Layout(props) {
  const location = useLocation();
  return (
    <>
      <OriginalLayout {...props} />
      <Head>
        <meta
          property="canonical_url"
          content={location.pathname + "/"}
        />
      </Head>

    </>
  );
}

@lunelson
Copy link

lunelson commented Oct 2, 2020

@slorber thanks, I'll take a look at that too. One thing about the router and this URL format issue that I haven't tested yet, is whether active link class matching will still work if the router paths are also forced to have trailing-slashes...

@slorber
Copy link
Collaborator Author

slorber commented Oct 30, 2020

Notes:

  • v1 generates 2 files: /docs/myDoc.html + /docs/myDoc/index.html
  • v2 generates 1 file: /docs/myDoc/index.html
  • without /docs/myDoc.html file, Github Pages server redirect from /docs/myDoc to /docs/myDoc/

We should probably add 2 new options:

  • 1 option to generate the "missing" /docs/myDoc.html file, because some hosting solutions (emm GH pages) can't handle it properly.
  • 1 option to add trailing slashes to all site routes, so that it's safe to host your site on all hosts

We can't add a trailing / by default, not everybody is willing to have / at the end of the URLs (legacy/SEO reasons)

Another idea is to resolve all relative links at build time to absolute links, so that the presence of a trailing slash or not does not impact in any way the link target.

On Netlify, disabling the Pretty Url option prevent Netlify from adding the trailing slash, yet if user visits the page with a trailing slash, it is not removed client-side, and still potentially breaks relative links.

@slorber slorber changed the title Proposal: Fix trailing host slash issues: write /myDoc.html instead of /myDoc/index.html Fix trailing slash issues (depend on hosting provider) Oct 30, 2020
@slorber
Copy link
Collaborator Author

slorber commented Oct 31, 2020

Until I figure how to integrate this properly in Docusaurus, here's a PR on the ReactNative website with some ideas to fix trailing slashes issues in userland: facebook/react-native-website#2297

@lunelson
Copy link

lunelson commented Nov 3, 2020

I've been solving this issue mainly by conforming URLs to the desired format using a function within the Link component—since all internal links have to pass through here at some point; one complexity with this is that if you are using the baseUrl option you have to consider root-based URLs that are outside of the your sub-directory to be external, i.e. static anchors not router-links...

@ukutaht
Copy link

ukutaht commented Dec 22, 2020

Having an option to generate /doc.html instead of /doc/index.html would be ideal for us

@Morriz
Copy link

Morriz commented Jan 18, 2021

Another idea is to resolve all relative links at build time to absolute links, so that the presence of a trailing slash or not does not impact in any way the link target.

This is probably the easiest, safest, and with least consequences. It is probably also the fastest! Any downsides here? It might involve replacing domain with localhost:$PORT for npm run serve but that's fine imo.

@agentofuser
Copy link

Is there an easy workaround (i.e. that can be done on the website repo without changing docusaurus itself or dependencies) to force a trailing slash on all links and URLs?

@thehappybug
Copy link

Please try to add this site plugin and let me know if it works for your case.

@slorber Can you share some instructions for trying this out? So far, I've tried adding the sitePlugin.js to a src/plugins/sitePlugin.js directory and updating the Docusaurus config to add plugins: ['./src/plugins/sitePlugin'], to the root of the document but it did not effect any change. The build remains the same and no folder.html style files are created.

PS: I'm using Netlify and I'm trying to avoid the duplicate URLs issue. Currently, the site crawlers are seeing two URLs, one without slashes (present in the pages that are build by D2) and with slashes (which Netlify prefers and tries redirecting to when it sees a URL without a slash).

@thehappybug
Copy link

thehappybug commented Jun 2, 2021

This is because we have had the strong recommendation from an SEO agency that—while neither format (trailing-slash or not) is "better"—there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

I double-down on what @lunelson said. For users who are using Netlify, we need the final build to be using trailing-slashes as that decision is made for us, and turning on/off the "Pretty URLs" setting does not help.

@slorber
Copy link
Collaborator Author

slorber commented Jun 4, 2021

Is there an easy workaround (i.e. that can be done on the website repo without changing docusaurus itself or dependencies) to force a trailing slash on all links and URLs?

@agentofuser not available now, but under consideration

@slorber
Copy link
Collaborator Author

slorber commented Jun 4, 2021

@thehappybug are you even sure the plugin runs in the first place?

Have you tried adding logs so that you know which files are created? Note you can also run your own script with node after yarn build, it does not necessarily have to be a Docusaurus plugin if it complicated your life.

I double-down on what @lunelson said. For users who are using Netlify, we need the final build to be using trailing-slashes as that decision is made for us, and turning on/off the "Pretty URLs" setting does not help.

I claim another time that this is false: disabling pretty URLs works reliably on Netlify and there won't be any redirect. VERY IMPORTANT: the global "disable asset processing" checkbox is broken and does not really disable pretty URLs: you have to uncheck it.

Proof: as part of my study to solve this issue, I'm writing a trailing slash guide for the community. There's a live Netlify deployment with pretty URLs disable on which you can see for yourself there is no server-side redirect: https://github.com/slorber/trailing-slash-guide#netlify

@slorber
Copy link
Collaborator Author

slorber commented Jun 4, 2021

@lunelson

This is because we have had the strong recommendation from an SEO agency that—while neither format (trailing-slash or not) is "better"—there should be 301 redirects to make sure that only one of the two formats is getting crawled and indexed. And because Netlify treats these URL formats identically, and its redirects system therefore doesn't allow you to 301 from one to the other (it would create an infinite loop), our only alternative is to use this pretty URLs feature (which does create a 301) and to accept the decision which is made for us, namely that trailing-slash will be the final format.

I'm not an SEO expert but I believe having 301 redirects is not mandatory if you have canonical URLs and you explain to the crawlers that the 2 URLs are the same page. Docusaurus sites have canonical URLs by default and crawlers shouldn't penalize your site for publishing duplicated content

@slorber
Copy link
Collaborator Author

slorber commented Jun 4, 2021

I've opened a draft PR to propose a solution: #4908

@thehappybug
Copy link

Thank you for your responses, @slorber. The trailing slash guide was quite useful.

Have you tried adding logs so that you know which files are created? Note you can also run your own script with node after yarn build, it does not necessarily have to be a Docusaurus plugin if it complicated your life.

I managed to get your solution working. I had previously made a mistake in registering plugins.

For now, I'm not going forwards with deploying this solution because:

  1. With Pretty URLs disabled, Netlify is serving pages at both slug.html, slug/ and slug. Even though the canonical URL should prevent any impact on SEO (I am not an expert but this is what I gleaned from the comments), I would prefer that Netlify's routing did not serve the pages at multiple URLs. Using Pretty URLs results in a cleaner routing that sits better with the web developers in my team (🤷‍♂️).
  2. A previous Hugo-based version of our docs site used trailing slashes in the URLs. This worked well with Netlify's Pretty-URLs (we had assets optimization disabled but as you pointed out, the Pretty URLs feature still kicks in). When we migrated to D2, the sitemap suddenly had URLs without trailing slashes in them but the Pretty URLs feature still redirected users to the URLs with trailing slashes. This confused the Google crawler at least. You can see from the screenshot what I mean:
    image
    If you drill down into what happened, you see that each ignored page has this issue:
    image
    While I don't see our search performance going down as of now, my team is quite wary of continuing the site in this state.
  3. The solution you suggested copies the HTML pages outside the directories. With Pretty URLs, these pages would turn from slug.html to just slug. Since I'm stuck with having trailing slashes in a previous version of the site and Netlify having permanently redirected crawlers to the URLs with trailing slashes, the solution doesn't help me. I wait for an option in D2 that helps me control trailing slashes so that I can put trailing slashes consistently in all places, in all pages, and sitemap.

I hope this feedback helps. Let me know if I can help you test any future releases related to this issue.

@slorber
Copy link
Collaborator Author

slorber commented Jun 7, 2021

Thanks for the feedback, I understand why you would want to keep the exact same URLs you had before, and I'll make that possible.

As soon as my PR is merged, you'd be able to use the @canary npm dist tag to give it a try (soon).

@slorber
Copy link
Collaborator Author

slorber commented Jun 8, 2021

As part of #4908

I'm implementing a new {trailingSlash: boolean | undefined} config.

  • undefined: it keeps the existing retro-compatible behavior (paths not modified), and output /path/index.html
  • false: remove a potential trailing slash on routes, links, canonical URLs, and output /path.html
  • true: force a trailing slash on routes, links, canonical URLs, and output /path/index.html

Refer to https://github.com/slorber/trailing-slash-guide for how the files will be served by your host.

I've created 3 deployments to test this feature:

Note: using true/false instead of undefined allows to use Netlify with pretty URLs on without having any annoying redirection, but it's not the case for the undefined behavior that will keep redirecting with pretty URLs on (default behavior): I disabled pretty URLs for undefined on purpose.

PLEASE: help me review those deployments and let me know if you see any unwanted side-effects: now is a better time to complain than after merging the PR :)

@slorber
Copy link
Collaborator Author

slorber commented Jun 9, 2021

The PR has been merged, and you can try this right now with the @canary dist tag or wait for beta.1

https://www.npmjs.com/package/@docusaurus/core?activeTab=versions

2.0.0-beta.df8a900f9

Please let me know if anything does not work asap.

Also, in general, can you please let me know:

  • which host you had a problem with
  • which trailingSlash setting fixed the problem

According to your feedbacks, we may update the doc and recommend a trailingSlash setting for each host

Labirintami pushed a commit to DHTMLX/docs-suite that referenced this issue Jun 24, 2021
jan-molak added a commit to serenity-js/serenity-js that referenced this issue Jun 14, 2023
Giscus relies on matching the pathName, so we need to ensure that the trailing slash behaves
consistently

Related tickets: facebook/docusaurus#3372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue is a proposal, usually non-trivial change
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants