Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import-in-the-middle causes exception when creating ERR_MODULE_NOT_FOUND message #52987

Open
timfish opened this issue May 14, 2024 · 18 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders

Comments

@timfish
Copy link

timfish commented May 14, 2024

Version

v22.1.0

Platform

All

Subsystem

module

What steps will reproduce the bug?

This was triggered by a combination of import-in-the-middle and tsx, but below is a minimal reproduction without either of those.

  • import-in-the-middle sets parentURL to node build-ins (node:*).
  • tsx tries to resolve files that do not exist

The

hook.mjs

export async function resolve(specifier, context, nextResolve) {
    specifier = 'file:///dost-not-exist.mjs';
    context.parentURL = 'node:util';
    return nextResolve(specifier, context);
}

index.mjs

import { register } from "node:module";

register('./hook.js',  import.meta.url);
await import("node:util");

Which gives the following error:

TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at fileURLToPath (node:internal/url:1491:11)
    at finalizeResolution (node:internal/modules/esm/resolve:265:42)
    at moduleResolve (node:internal/modules/esm/resolve:924:10)
    at defaultResolve (node:internal/modules/esm/resolve:1148:11)
    at nextResolve (node:internal/modules/esm/hooks:750:28)
    at resolve (data:application/javascript;base64,CiAgICBleHBvcnQgYXN5bmMgZnVuY3Rpb24gcmVzb2x2ZShzcGVjaWZpZXIsIGNvbnRleHQsIG5leHRSZXNvbHZlKSB7CiAgICAgIHNwZWNpZmllciA9ICdmaWxlOi8vL2Rvc3Qtbm90LWV4aXN0Lm1qcyc7CiAgICAgIGNvbnRleHQucGFyZW50VVJMID0gJ25vZGU6dXRpbCc7CiAgICAgIHJldHVybiBuZXh0UmVzb2x2ZShzcGVjaWZpZXIsIGNvbnRleHQpOwogICAgfQo=:5:14)
    at nextResolve (node:internal/modules/esm/hooks:750:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:238:30)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:199:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:821:20) {
  code: 'ERR_INVALID_URL_SCHEME'
}

This is caused by this code where base is node:util:

throw new ERR_MODULE_NOT_FOUND(
path || resolved.pathname, base && fileURLToPath(base), resolved);

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior? Why is that the expected behavior?

This should throw a valid ERR_MODULE_NOT_FOUND error

What do you see instead?

It's throwing while creating the message

Additional information

This was made incredibly difficult to debug because you can't debug loader hooks!

Credit to @nwalters512 for the back and forth debugging with me!

@Flarna Flarna added the loaders Issues and PRs related to ES module loaders label May 14, 2024
@RedYetiDev RedYetiDev added esm Issues and PRs related to the ECMAScript Modules implementation. confirmed-bug Issues with confirmed bugs. labels May 14, 2024
@RedYetiDev
Copy link
Member

Thanks for the issue, if you want to submit a patch, feel free to do so, if not, I'm sure a member of the team will be happy to assist

@RedYetiDev
Copy link
Member

@nodejs/loaders correct me if I'm wrong, but wouldn't this behavior also cause errors in the experimental HTTP import system?

@GeoffreyBooth
Copy link
Member

This seems like expected behavior to me. It can't create a URL with that parent. It can't determine that the module doesn't exist if it can't first figure out what URL to try to retrieve.

@RedYetiDev RedYetiDev removed the confirmed-bug Issues with confirmed bugs. label May 15, 2024
@RedYetiDev
Copy link
Member

Ack, didn't mean to add that label, I was looking to see if we had a "bug" label and I must've misclicked it

@nwalters512
Copy link

nwalters512 commented May 15, 2024

Node is already correctly determining that the file does or does not exist. As described above in the original issue, the error we're encountering actually occurs when Node tries to construct the ERR_MODULE_NOT_FOUND error to throw.

I agree that the situation that import-in-the-middle gets us into (parentURL === 'node:util') is a weird one, but Node is so close to handling it in a sensible way. For instance, if fileURLToPath(base) errored out while constructing the ERR_MODULE_NOT_FOUND error, it could instead use base verbatim.

@RedYetiDev
Copy link
Member

RedYetiDev commented May 15, 2024

This seems like expected behavior to me. It can't create a URL with that parent. It can't determine that the module doesn't exist if it can't first figure out what URL to try to retrieve.

Even so, IMO a more descriptive error may be helpful, rather than the one thrown by the conversion utility

node:util is not a valid <...>?

@timfish
Copy link
Author

timfish commented May 15, 2024

It can't determine that the module doesn't exist if it can't first figure out what URL to try to retrieve.

It has already determined that the module doesn't exist, it's just trying to construct the error message when it throws.

I can't see how this is expected behaviour. the code assumes base is a file URL and it is not.

If parentURL can only be a file URI and never a node: URI (or any other URI!) this should be enforced earlier in resolve.

Just be aware that enforcing this in resolve will break import-in-the-middle and everything that relies on it!

@GeoffreyBooth
Copy link
Member

If parentURL can only be a file URI and never a node: URI (or any other URI!) this should be enforced earlier in resolve.

And if we did, what error would be thrown? Probably ERR_INVALID_URL_SCHEME, I’d think. In which case, why does it matter whether it’s thrown here or earlier?

@nwalters512
Copy link

Throwing earlier would ensure consistent behavior. Right now, everything happily works with a node:* parentURL provided that the file being resolved exists. I want Node to either always use the normal resolving behavior with parentURL = 'node:*', or always error out quickly and obviously.

@timfish
Copy link
Author

timfish commented May 15, 2024

Is node: even an invalid scheme for parentURL?

In which case, why does it matter whether it’s thrown here or earlier?

The current error doesn't even tell you which parameter was the wrong scheme. It's totally useless in terms of helping you fix the issue. Because there's no way to debug loader hooks, its difficult to track down.

There's a huge difference in terms of DX between an unintentional exception and one designed to guide users of an API.

The entire node codebase is full of places where guiding users in this way has been prioritised over just letting random exceptions happen anywhere.

@GeoffreyBooth
Copy link
Member

Right now, everything happily works with a node:* parentURL provided that the file being resolved exists.

What do you mean? Resolving a specifier where the URL exists, regardless of parentURL, works even with a node:-scheme parentURL?

The current error doesn’t even tell you which parameter was the wrong scheme. It’s totally useless in terms of helping you fix the issue.

Sure, this could be improved by having the error message specifically refer to parentURL. Many functions have validation steps, like validateString calls; we could validate that parentURL is of file: scheme right off the bat at the top of the function so that it errors early (and informatively, and consistently). I think an error specific to the URL scheme is more correct than an ERR_MODULE_NOT_FOUND, though. The latter sounds like something was just missing from disk, whereas really we couldn’t even look for something on disk because we couldn’t parse the URL.

Because there’s no way to debug loader hooks, it makes it even more obscure.

It is definitely challenging to debug code that’s not on the main thread, but it’s not impossible. fs.writeFileSync(1, 'something') is equivalent to console.log('something').

@nwalters512
Copy link

nwalters512 commented May 15, 2024

What do you mean? Resolving a specifier where the URL exists, regardless of parentURL, works even with a node:-scheme parentURL?

This is exactly what I mean, yes. You can try it for yourself with the example in the original issue; replace file:///dost-not-exist.mjs with an absolute file URL pointing to a file that does exist. For instance, file:///Users/nathan/git/register-hook-playground/index.mjs is a file that exists on my machine, and running the reproduction example does not error with that path as the specifier. Even a slight modification to make the path refer to a file that does not exist, for instance file:///Users/nathan/git/register-hook-playground/indexx.mjs, produces an error.

The latter sounds like something was just missing from disk, whereas really we couldn’t even look for something on disk because we couldn’t parse the URL.

I think there's some confusion here. Again, the error that is currently thrown is not thrown because Node couldn't "look for something on disk". Node does look for the file, determines it does not exist, and is in the process of constructing an ERR_MODULE_NOT_FOUND error when it tries to parse the base URL for inclusion in the error, and that is what fails. The link to the Node source in the issue shows where this is happening:

throw new ERR_MODULE_NOT_FOUND(
path || resolved.pathname, base && fileURLToPath(base), resolved);

@timfish
Copy link
Author

timfish commented May 15, 2024

Throwing earlier would ensure consistent behavior

Throwing in resolve when parentURL is a Node URI would fundamentally break import-in-the-middle which has 4.5m weekly downloads and is relied on heavily in the ecosystem.

To fix this issue, it simply needs to not attempt converting base to a file path if it's not a file path when constructing the error message. It's a one line fix and I can submit a PR.

@GeoffreyBooth
Copy link
Member

Throwing in resolve when parentURL is a Node URI would fundamentally break import-in-the-middle which has 4.5m weekly downloads and is relied on heavily in the ecosystem.

This isn’t really relevant. If we have a bug, and/or if they have a bug, then it needs to be fixed. The question is what the intended behavior of parentURL is. If our documentation says that it’s only used when the specifier is relative, and ignored otherwise, then the current behavior is fine and your one-line fix is fine. But if there’s somewhere where we document that parentURL needs to fulfill certain criteria, like it needs to be a file: URL and not some other type of URL or something completely different like an object, then we have a bug in that our actual behavior doesn’t match our intended behavior. If import-in-the-middle relies on this bug, well, then fixing the bug would be handled carefully, but it would still need to be fixed. But I think the former is more likely and a simple fix would be fine.

@nwalters512
Copy link

The documentation on https://nodejs.org/api/module.html#resolvespecifier-context-nextresolve doesn't say too much about parentURL:

The module importing this one, or undefined if this is the Node.js entry point

I could interpret this to mean that node: module specifiers should be allowed.

@timfish my hope is that you/we could pursue a fix to import-in-the-middle in parallel with this. My guess is that a whole lot of people will hit this very soon with your new version of the Sentry SDK, and I can't imagine that waiting for the Node fix to be released and waiting for all Sentry users to jump to the latest version of Node will work well.

@timfish
Copy link
Author

timfish commented May 15, 2024

Throwing in resolve when parentURL is a Node URI would fundamentally break import-in-the-middle which has 4.5m weekly downloads

This isn’t really relevant. If we have a bug, and/or if they have a bug, then it needs to be fixed. The question is what the intended behavior of parentURL is.

It's highly relevant.

@wesleytodd recently mentioned on Twitter that over the years Node has gone out of their way not to break the ecosystem where possible.

The documentation is ambiguous at best. Node has accepted these parentURLs for years through numerous LTS versions and changing this would be considered a breaking change by many.

After this amount of time the "intended behaviour" is irrelevant and the usage in the ecosystem trumps that. It's far less painful to update the docs to reflect the current usage rather than break longstanding packages because the intended behaviour wasn't documented fully in the first place.

@wesleytodd
Copy link
Member

@wesleytodd recently mentioned on Twitter that over the years Node has gone out of their way not to break the ecosystem where possible.

I am not completely sure which topic is referenced here (probably a sign I need to stop checking twitter so much), but I think generally this references how CITGM is in place to help catch ecosystem breaking changes. That said, I am not sure the same goals apply between changing http in a way that breaks express and this which is still in an experimental status.

I don't know enough to make a call, and don't have the time availability to come up to speed on this at the moment. Sorry I cannot be more help.

@nwalters512
Copy link

I opened a PR to fix this in import-in-the-middle: DataDog/import-in-the-middle#76. While I'm still glad that Node is willing to consider a behavior change here, this will hopefully help folks in the meantime.

bengl pushed a commit to DataDog/import-in-the-middle that referenced this issue May 31, 2024
This PR fixes the issue that's been described at length in the following
issues:

- getsentry/sentry-javascript#12011
- nodejs/node#52987

To summarize, `import-in-the-middle` can cause problems in the following
situation:

- A Node core module (`node:*`) is imported
- Another loader is added before `import-in-the-middle`
- That loader tries to resolve files that don't exist on disk

In practice, this occurs when using `import-in-the-middle` with code
that's being executed with
[`tsx`](https://github.com/privatenumber/tsx). `tsx` will try to resolve
`.js` files by also looking for the same file path but with a `.ts`
extension. In many cases, including the synthetic code that
`import-in-the-middle` generates to import `lib/register.js`, such a
corresponding `.ts` file does not exist.

The actual error arises from Node, which assumes that `parentURL` will
always be a `file://` URL when constructing an `ERR_MODULE_NOT_FOUND`
error; see the linked issue above. In the above scenario, the `.ts` file
that is being resolved does not exist, so such an error is created, and
`parentURL === 'node:*'`, so the failing case is triggered. It seems
like Node is receptive to changing that behavior, but in the meantime, I
was hoping to land this patch so that one doesn't have to wait for and
upgrade to a new version of Node.

The fix works by short-circuiting the resolution of `lib/register.js` so
that the other loader (that tries to resolve non-existent paths) is
never tried.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders
Projects
None yet
Development

No branches or pull requests

6 participants