Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve require/import/export support in unminify #30

Open
0xdevalias opened this issue Dec 18, 2023 · 10 comments · May be fixed by #50
Open

improve require/import/export support in unminify #30

0xdevalias opened this issue Dec 18, 2023 · 10 comments · May be fixed by #50
Labels

Comments

@0xdevalias
Copy link

0xdevalias commented Dec 18, 2023

Was testing the new v2.11.0 web IDE update today and found some areas that unminify could be further improved.

This issue is about the require/import/export syntax after unminification.

You can get the minimised code that I am testing against here (Ref, that repo also has lots of other example code if you want to test further)

Webcrack

Loading the above minimised code in the webcrack web IDE (Ref) with the following config:

image

In the unminified output, you can see an example of the current require/import/export in 180.js:

(Note: I also used this same file as my example for the JSX unminify in #10 (comment), but the specifics in this issue are different, despite it using the same source)

(Sidenote: I love the /*webcrack:missing*/ annotation here!)

180.js

require.d(exports, {
  Z: function () {
    return a;
  }
});
var r = require( /*webcrack:missing*/"./35250.js");
function a(e) {
  var t;
  var n = e.url;
  var a = e.size;
  var i = a === undefined ? 16 : a;
  var s = e.className;
  try {
    t = new URL(n);
  } catch (e) {
    console.error(e);
    return null;
  }
  return (0, r.jsx)("img", {
    src: `https://icons.duckduckgo.com/ip3/${t.hostname}.ico`,
    alt: "Favicon",
    width: i,
    height: i,
    className: s
  });
}

Wakaru

Contrasting this against the unminified output from wakaru's web IDE (Ref) (which separates the 'unpack modules' and 'unminify' steps)

Unpacked module-180.js source:

var r = require(35250);
function a(e) {
  var t;
  var n = e.url;
  var a = e.size;
  var i = void 0 === a ? 16 : a;
  var s = e.className;
  try {
    t = new URL(n);
  } catch (e) {
    return console.error(e), null;
  }
  return (0, r.jsx)("img", {
    src: "https://icons.duckduckgo.com/ip3/".concat(t.hostname, ".ico"),
    alt: "Favicon",
    width: i,
    height: i,
    className: s,
  });
}

module.exports = {
  Z: a,
};

Transformed (unminified) module-180.js source:

const { jsx } = require(35250);

function a(e) {
  let t;

  const { url, size, className } = e;

  const i = size === undefined ? 16 : size;
  try {
    t = new URL(url);
  } catch (e) {
    console.error(e);
    return null;
  }
  return (
    <img
      src={`https://icons.duckduckgo.com/ip3/${t.hostname}.ico`}
      alt="Favicon"
      width={i}
      height={i}
      className={className}
    />
  );
}

export default {
  Z: a,
};
@j4k0xb
Copy link
Owner

j4k0xb commented Dec 18, 2023

Yeah it needs to get reworked at some point... the tricky part is detecting if a module is esm or not if it doesn't use require.d or similar (10604.js).
require should be converted to import { jsx } from "./35250.js"; which will also solve #6 and partly #10

The default export is commonly called Z: https://github.com/swc-project/swc/blob/3550d0be26ebab7bcb7942e1a12a7d5fc3aedb17/crates/swc_ecma_minifier/tests/fixture/projects/next/archive-2/pages/hmr/.counter-c6a48518c4a56e4d9ce5/input.js#L10-L15

But it's still accessed like any other named export: https://github.com/swc-project/swc/blob/3550d0be26ebab7bcb7942e1a12a7d5fc3aedb17/crates/swc_ecma_minifier/tests/fixture/projects/next/archive-2/pages/hmr/.counter-c6a48518c4a56e4d9ce5/input.js#L230
so either export default function a(e) { or export { a as Z }; or export function Z(e) {

@0xdevalias
Copy link
Author

the tricky part is detecting if a module is esm or not if it doesn't use require.d or similar

Forgive my naivety, but would it matter either way? Could it just be a user override setting of like 'always produce ESM', even if the original might not have been? Or would it introduce issues into the code that would make it not work/etc?

I guess based on this note (Ref) it seems there may be incompatibilities if this was done.

@j4k0xb
Copy link
Owner

j4k0xb commented Dec 19, 2023

Some packages require import * as name from 'package' instead of import name from 'package'. We cannot detect this automatically, so you might need to fix it manually.

And because you can't import esm from commonjs but that's impossible to validate with multiple chunks or other code we don't know about

Here's a wip version that converts all top level requires and this export variation: https://deploy-preview-31--webcrack.netlify.app/
Result for 180.js:

import { jsx } from /*webcrack:missing*/"./35250.js";
export function Z(e) {
  var t;
  var n = e.url;
  var a = e.size;
  var i = a === undefined ? 16 : a;
  var s = e.className;
  try {
    t = new URL(n);
  } catch (e) {
    console.error(e);
    return null;
  }
  return <img src={`https://icons.duckduckgo.com/ip3/${t.hostname}.ico`} alt="Favicon" width={i} height={i} className={s} />;
}

now the references will also be detected:
image

@0xdevalias
Copy link
Author

but that's impossible to validate with multiple chunks or other code we don't know about

@j4k0xb That makes sense for code we don't know about, but why is it impossible to validate for multiple chunks? Is that just because webcrack only operates on a single chunk at a time currently? (Ref)

Here's a wip version that converts all top level requires and this export variation

now the references will also be detected:

Nice! That's looking so much better/more useful already!


Using that same example code (Ref), with the new preview version (Ref), looking at 63390.js, there are some 'top of file' imports, but then around line 108 there seem to be some more imports that aren't currently hoisted to the top of the file.

Details

Top of file imports:

// 63390.js, lines 1-24
var r;
import { _ as _2 } from /*webcrack:missing*/"./39324.js";
import { _ as _3 } from /*webcrack:missing*/"./70216.js";
import { _ as _4 } from /*webcrack:missing*/"./4337.js";
import { jsxs, Fragment, jsx } from /*webcrack:missing*/"./35250.js";
import { Z as _Z } from /*webcrack:missing*/"./19841.js";
import { useRouter } from /*webcrack:missing*/"./68555.js";
import { useId, useState, useRef, useEffect, useCallback, forwardRef, Children, useMemo, memo } from /*webcrack:missing*/"./70079.js";
import { Z as _Z2 } from /*webcrack:missing*/"./34303.js";
import { Dd } from "./75179.js";
import { hz } from /*webcrack:missing*/"./64135.js";
import { FZ } from /*webcrack:missing*/"./88038.js";
import { tQ } from "./75527.js";
import { Qd } from "./36716.js";
import { wp } from /*webcrack:missing*/"./59710.js";
import { RR } from "./56244.js";
import { z as _z } from /*webcrack:missing*/"./66958.js";
import { Z as _Z3 } from "./30931.js";
import { Z as _Z4 } from "./87105.js";
import { _ as _5 } from /*webcrack:missing*/"./22830.js";
import { ZP as _ZP } from /*webcrack:missing*/"./2827.js";
import { WS } from /*webcrack:missing*/"./82841.js";
import { s6 } from /*webcrack:missing*/"./36218.js";
import { Os, uU } from "./69403.js";

Part way down, non-hoisted:

// 63390.js, lines 108-115
import { _ } from /*webcrack:missing*/"./21722.js";
import { Jh } from /*webcrack:missing*/"./39889.js";
import { h } from /*webcrack:missing*/"./10642.js";
import { get } from /*webcrack:missing*/"./47635.js";
import { kP } from /*webcrack:missing*/"./32787.js";
import { ZP as _ZP2 } from /*webcrack:missing*/"./24274.js";
import { Z as _Z5 } from "./75515.js";
import { Z as _Z6 } from "./180.js";

With require and code with potential side effects, having them halfway down could change the logic; but if they are truly imports, then hoisting them to the top of the file should be fine I believe.

From memory, this web app is bundled with swc; I wonder if it has specific logic that knows (or is told) when an import/require halfway through a file isn't safe to be hoisted; and if so, whether we could leverage that knowledge to know when not to change a require to an import?

Looking at the same file/output in wakaru (Ref), it looks like it also doesn't hoist them, but it seems that it uses a mixture of both import and require, and these 'part way down' entries seem to be using require. I haven't deeply looked into the logic to validate why that is. (but see the 'Details' below, in the 'unminified' section, it seems that at least some of the requires for non-missing module files were converted to import and hoisted to the top of the file)

Details

Top of File

Source (unpacked)

// module-63390.js, lines 1-24
var r;
var a = require(39324);
var i = require(70216);
var s = require(4337);
var o = require(35250);
var l = require(19841);
var u = require(68555);
var d = require(70079);
var c = require(34303);
var f = require(75179);
var h = require(64135);
var g = require(88038);
var m = require(75527);
var p = require(36716);
var v = require(59710);
var x = require(56244);
var b = require(66958);
var y = require(30931);
var w = require(87105);
var j = require(22830);
var _ = require(2827);
var C = require(82841);
var M = require(36218);
var k = require(69403);

Transformed (unminified)

Note that wakaru doesn't currently explicitly annotate missing modules (Ref), so it could be that these 'top of file' requires that weren't converted to import is just because the modules are missing.

// module-63390.js, lines 1-49
import f from "module-75179.js";
import m from "module-75527.js";
import p, { Qd } from "module-36716.js";
import x, { RR } from "module-56244.js";
import y from "module-30931.js";
import w from "module-87105.js";
import k from "module-69403.js";
import R from "module-75515.js";
import U from "module-180.js";
import Y, { dQ } from "module-77442.js";
import { v as v$0 } from "module-31721.js";
import ec, { Y8, m0 } from "module-5046.js";
import { Iy } from "module-25094.js";
let r;

const { _: _$5 } = require(39324);

const { _: _$4 } = require(70216);

const { _: _$0 } = require(4337);

const o = require(35250);

const { jsxs, jsx } = o;

const { Z: Z$0 } = require(19841);

const { useRouter } = require(68555);

const d = require(70079);

const { useId, useState, useRef, useEffect, useCallback, forwardRef, useMemo } =
  d;

const c = require(34303);

const { hz } = require(64135);

const g = require(88038);
const v = require(59710);
const b = require(66958);

const { _: _$1 } = require(22830);

const _ = require(2827);

const { WS } = require(82841);

const M = require(36218);

Part way down

Source (unpacked)

// module-63390.js, lines 138-146
var S = c.Z.div(N());
var I = require(21722);
var F = require(39889);
var E = require(10642);
var D = require(47635);
var L = require(32787);
var A = require(24274);
var R = require(75515);
var U = require(180);

Transformed (unminified)

// module-63390.js, lines 170-182
var S = c.Z.div(N());

const { _: _$3 } = require(21722);

const { Jh } = require(39889);

const { h: h$0 } = require(10642);

const D = require(47635);

const { kP } = require(32787);

const A = require(24274);

Note that in the 'unpacked' for this section, there are requires that are no longer present in this 'unminified' section (75515, 180); looking closer, it seems that these were converted to imports, and hoisted to the top of the file:

// module-63390.js, lines 8-9
import R from "module-75515.js";
import U from "module-180.js";

@j4k0xb
Copy link
Owner

j4k0xb commented Dec 22, 2023

but why is it impossible to validate for multiple chunks? Is that just because webcrack only operates on a single chunk at a time currently?

Yes, or rather a lot of work to do it "properly". Requires operating on all chunks and building an import/require graph like in pionxzh/wakaru#73.
Maybe there's a better solution

Looking at the same file/output in wakaru (Ref), it looks like it also doesn't hoist them, but it seems that it uses a mixture of both import and require, and these 'part way down' entries seem to be using require

I think it was a mistake to convert all requires, what wakaru does with not touching the 'part way down' ones is more correct.

if they are truly imports, then hoisting them to the top of the file should be fine I believe.

I doubt imports are ever moved down so it's probably a module with only require or both import/require (which webpack can bundle).

@0xdevalias
Copy link
Author

0xdevalias commented Dec 23, 2023

Yes, or rather a lot of work to do it "properly". Requires operating on all chunks and building an import/require graph like in pionxzh/wakaru#73.

@j4k0xb nods, yeah, makes sense.

If you haven't seen them yet, some of these tools/libs I found the other day for visualising dependency graphs could potentially be useful for the module graph concept (Ref)

I wonder if even 'not proper' support would be useful to add, and how hard that would be. I've found wakaru's current approach pretty useful even without the 'module graph', in just being able to unpack all the chunks and explore them more manually (though that's probably a seperate musing/usability thing and not super related to the topic here)

Maybe there's a better solution

I haven't deeply looked into this, and not for ages, but at one stage I remember having a thought that the chunks specified the other chunks they depended on somewhere (as well as the individual module imports within it) (Ref)

In the code I was most exploring, theres the _buildManifest.js (Ref) and webpack.js (Ref) chunks that seemed to detail some of the 'high level' of the chunk loading/dependencies/etc; though there was also the chunks loaded directly in the html as well.

Looking at a fairly small/basic chunk, it seems like it doesn't have anywhere that specifies dependencies on other chunks (Ref)

But then looking at a far larger chunk file (pages/_app.js (Ref), there is this section after all of the normal module definitions that looks like it might handle loading other chunks if they aren't already loaded, and module dependency order or similar:

function (U) {
  var B = function (B) {
    return U((U.s = B));
  };
  U.O(0, [774, 179], function () {
    return B(18992), B(9869), B(76281);
  }),
    (_N_E = U.O());
},

I think it was a mistake to convert all requires, what wakaru does with not touching the 'part way down' ones is more correct.

@j4k0xb nods makes sense. I'm not sure of the heuristics it uses to decide if they can be converted to import (and how 'smart' they are), but at least for some of the ones that weren't converted, I think that may have been due to them being for missing files.


I doubt imports are ever moved down so it's probably a module with only require or both import/require (which webpack can bundle).

@j4k0xb Yeah, the 'import/require' part was what my original theory was.

@pionxzh
Copy link

pionxzh commented Dec 24, 2023

Looking at the same file/output in wakaru (Ref), it looks like it also doesn't hoist them, but it seems that it uses a mixture of both import and require, and these 'part way down' entries seem to be using require. I haven't deeply looked into the logic to validate why that is.

The current behavior of wakaru is to convert all top-level requires to imports, ignoring the risk of potentially changing the behavior. The reason why those requires in the sample are not converted is because they are missing chunks. wakaru also has a hoist flag to convert and hoist all nested requires to top-level imports, but there is currently no way for users to pass flags 😓

I'm not sure if we can tell if a require should be converted to imports even with cross-module analysis.

@0xdevalias
Copy link
Author

0xdevalias commented Dec 30, 2023

I'm not sure if we can tell if a require should be converted to imports even with cross-module analysis.

@pionxzh I haven't looked into this too deeply, but in cases where there is something similar to this, it should be safe to assume it's truly an ESM module (and so would be safe to convert to import) yeah?

Object.defineProperty(exports, "__esModule", { value: true });

Like this sort of thing:

The modules are also converted to esm (more or less) if they're inside of a bundle and it removes require.r, require.d.
What's still left is require.j and module = require.nmd(module) and Object.defineProperty(exports, "__esModule", { value: true });

Originally posted by @j4k0xb in #33 (comment)


@pionxzh Another case that could potentially be checked is whether the imported module is 'pure' or not. Eg. If it runs code in the main body of the file / an IIFE then it probably(?) wouldn't necessarily be safe to convert it to import, and might be safer to leave it as require?

Not sure off the top of my head, but there might also be some other 'heuristics' of things that only appear/only are supported in ESM modules (or vice versa), that could be looked for to determine if it's safe to convert require to import?

@j4k0xb
Copy link
Owner

j4k0xb commented Dec 30, 2023

afaik Object.defineProperty(exports, "__esModule", { value: true }); is inserted by tools like TypeScript and __webpack_require__.r by webpack, and nextjs does its own thing by not including any of this information 🥲

some more heuristics:

  • __webpack_require__.d - define esm exports
  • module = __webpack_require__.hmd(module); - when an esm module accesses the global module variable
  • __webpack_require__.n - default import
  • only top level var <name> = __webpack_require__(<id>);
  • indirect calls ((0, fn)(...args) type of calls #6) coming from such a variable
  • exports.__esModule = true; (most likely not generated by webpack though)

I looked at many of your samples and sometimes esm appears to be commonjs because of variable inlining or other build tools.
e.g.: https://github.com/0xdevalias/chatgpt-source-watch/blob/main/orig/_next/static/chunks/167-121de668c4456907.js module 75527:

var m = __webpack_require__(57311); // import ... from '57311', how webpack normally creates it
__webpack_require__(44675).env.INTERNAL_API_URL; // was probably `var n = __webpack_require__(44675); n.env.INTERNAL_API_URL`

or https://github.com/0xdevalias/chatgpt-source-watch/blob/main/orig/_next/static/chunks/293-defd068c38bd0c8d.js module 12285, temporary variables from typescript enums but it should detect and hoist imports:

__webpack_require__.d(exports, {
  R: function () {
    return V;
  },
});
var r;
var o;
var i;
var a;
var u = __webpack_require__(70079);
var c = __webpack_require__(62530);
var s = __webpack_require__(19430);
var l = __webpack_require__(9335);
var f = __webpack_require__(41800);
function p(t, e) {
  let [n, r] = (0, u.useState)(t);
  let o = (0, f.E)(t);
  (0, l.e)(() => r(o.current), [o, r, ...e]);
  return n;
}
var v = __webpack_require__(84325);
// ...
(r = P || {})[r.Open = 0] = "Open";
r[r.Closed = 1] = "Closed";
var P = r;

And because you can't import esm from commonjs

did some more tests and apparently webpack is able to bundle it, even mixed commonjs/esm in a single module
so it should be fine to try to convert to esm when its obvious (without having to rely on analyzing other modules) and just keep the rest as commonjs

@j4k0xb j4k0xb linked a pull request Jan 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants