Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address concerns from package maintainers that sourcemaps are too large #41

Open
justingrant opened this issue May 17, 2023 · 7 comments

Comments

@justingrant
Copy link

justingrant commented May 17, 2023

Package maintainers have a lot of reasons why they don't want to ship sourcemaps, but one common complaint that I've heard is that sourcemaps make package downloads and on-disk storage much larger than they'd otherwise be.

For example, the AWS JavaScript SDK's 380 packages reduced download sizes by 21%, or more than 160MB total by removing sourcemaps. Obviously no real user is going to use all 380 packages, but using 20-30 of them is probably not unusual, esp. given intra-SDK dependencies.

For AWS in particular, I suspect that they're sensitive to this because the AWS SDK is probably loaded into hundreds of thousands of cloud server images running in AWS, so all those sourcemaps may have a material impact on AWS's overall storage budget.

But the net result is that the AWS SDK is now harder to debug and troubleshoot. There's been an open issue aws/aws-sdk-js-v3#2895 for 18 months to create a separate debug SDK that has sourcemaps, but there's been no action yet. And honestly that seems like a bad solution because it'd require a different prod vs. dev setup that seems easy to break or get out of sync.

Obviously AWS is an extreme case, but I've heard similar size complaints from many other maintainers. googleapis/google-cloud-node#2867 is a good honeypot of examples (read the issues linked to this one) of maintainers deciding not to ship sourcemaps.

Can we do something to address maintainers' concerns? For example:

  1. Better compression, including on-disk compression not just over-the-wire compression like we have now?
  2. Ability to split sourcemaps into separate packages that can be downloaded on demand by debuggers or call-stack-generators? For this to work, ideally the npm registry would be able to publish both packages at the same time, or perhaps the packages could be split by npm itself in the cloud?
  3. Maybe some way to minify sourcemaps, so that actual running code is retained but comments, whitespace, etc. can be downloaded on demand by debuggers and browser dev-tools, without impeding the ability of runtimes to generate call stacks?

I admit I'm not expert enough to know which (if any) of the solutions above are practical. But without addressing the size issue, I fear that we'll never get full adoption of sourcemaps across the ecosystem.

@justingrant
Copy link
Author

Here's another size-related concern from googleapis/google-cloud-node#2867 (comment), which I admit I don't fully understand but perhaps it's about browser download size?

In the attempt to publish our libraries in dual-format for ESM & CJS, we've decided against publishing source maps to reduce the overall package size.

@mitsuhiko
Copy link
Contributor

One issue I was not too familiar with until recently is that bundle size concern are apparently also real on backend services. The reason being that node_modules ends up on serverless lambda layers quite a bit and there are various concerns with that.

@jkrems
Copy link
Contributor

jkrems commented May 17, 2023

/cc @MylesBorins because of npm package distribution

I think the real solution would be separate debug packages. With those in place, I don't think you'd need additional source map features afaict. I'm not sure compression would really solve this.

@jridgewell
Copy link
Member

I wonder how much of this is because sourcesContent is included in the sourcemap? We solved this by using a static-file server to host the actual source files, and referencing those in the sources and sourceRoot with a immutable fingerprinted URL.

After that, the names array is still large, but it's the VLQ encoding that causes the biggest impact. The delta encoding is extremely clever, but requiring that delta to be encoded in Base64 with VLQ means we're only encoding 5 bits per byte. Some 80% of the segments encoded into the mappings require multiple bytes to encode one of the deltas (names index jumps around wildly!).

@justingrant
Copy link
Author

I wonder how much of this is because sourcesContent is included in the sourcemap?

Good question. @trivikr, do you remember approximately what % of your AWS SDK packages' size were sourcesContent vs. sourcemap files overall?

I think the real solution would be separate debug packages.

Do you have a sense of how this would work? Do you envision separate packages, one with code+sourcemaps, and one just code? Or a separate "sourcemap-only" package (no executable code, just sourcesContent or sources files) that would be downloaded in a just-in-time fashion when needed by IDEs, debuggers, browser devtools, Sentry, etc?

And what would be the publishing workflow? Would maintainers need separate build configs and need to run npm publish twice? Or would the splitting be automated as part of an atomic npm publish?

I'm not sure compression would really solve this.

Agree, compression alone may not be enough. But unminified sourcesContent along with the rest of sourcemap JSON has a lot of repeated text so I'd assume it'd be highly compressible. What compression ratio do you think we'd typically get for a sourcemap-only package? In the "split package" idea above, would it make sense to leave the sourcemap package as a .tgz form after it's downloaded rather than chewing up the disk for files that are only used during debugging or other exceptional cases?

@mitsuhiko
Copy link
Contributor

I think the correct solution would be to eventually find a way to have something like a source map server protocol. If we get traction going on either debug IDs or source hashes, it would trivial enough to solve this particular problem. Then the main distribution does not contain source maps, and they can be published separate to either npm or any publicly available URL.

@llllvvuu
Copy link

llllvvuu commented Aug 30, 2023

With the debugId approach it would be great if there were a way to pre-download (for local/offline) the symbol files for all build artifacts in a package - something like npm pull-debuginfo mypkg and/or npm install --with-debuginfo or debuginfo in devDependencies.

npm and the debugging frontend (or debuginfod-equivalent) would have to agree on the location of the local database. Maybe there is utility to having both sourceMappingURL and debugId - e.g. if npm had no support and no symbol server were available then one could do something like

//# sourceMappingURL=node://mypkg-sources#dist/index.js.map
//# debugId=...

If npm were to host the symbols, one way npm could enshrine source/debug packages is by having a "sources" key in package.json (.js.map, .d.ts.map, src/, maybe even tsconfig.json and rollup.config.js) - and then npm pack produces 2 tarballs / npm publish uploads 2 tarballs.

Then people wouldn't have to sync up two packages like we have to do for @types/ (actually now that people have moved towards including .d.ts in the main package, I wonder if there was ever a push for something like npm install --without-types).


#42 is also relevant here. I assume a symbol server would have the easiest time serving sourcesContent. So, I have launch.json so that I can press play and this starts node --inspect-brk and connects the Debug Adapter Protocol.

How do I set a breakpoint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants