Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist cache to disk? #2182

Open
Rich-Harris opened this issue May 11, 2018 · 32 comments
Open

Persist cache to disk? #2182

Rich-Harris opened this issue May 11, 2018 · 32 comments

Comments

@Rich-Harris
Copy link
Contributor

This Twitter thread got me thinking — perhaps it would make sense for Rollup to (optionally?) write its cache to disk for faster cold builds. At the moment the cache is only used (at least by the CLI) when running in --watch mode, by keeping the results in memory.

After all, if Parcel are going to use our ideas it's only fair we borrow some of theirs as well 😀

@keithamus
Copy link
Contributor

For a project I'm working on we use rollup, and we memoize all calls to rollup - as well as every call to our transform plugins (mostly babel), and the memoization is seralisable so we serialise it to disk. While we found caching the output of rollup (keyed by the input and file checksums) to be very useful, we also found a huge benefit to caching the output of transforms (keyed by input and file checksums too). If you'd like more details I'm happy to share.

@Rich-Harris
Copy link
Contributor Author

Interesting — the cache works on the assumption that identical input plus identical set of plugins results in identical output, so Rollup doesn't bother checking the output from transform A before piping it into transform B (unless that changed in my absence, ha!). Are you saying there's a benefit to doing so?

(Side-note: the assumption that the configuration hasn't changed is important here; there would need to be some way of checking that it was in fact unchanged, if work was cached between runs.)

@keithamus
Copy link
Contributor

We were unable to get Rollup's existing cache to persist to disk, so we rolled our own which works over the top of rollup. For our own cache system - where we have a build of say a hundred files - if one file changes out of that 100 we don't want to have to run babel for 100 files again, so we simply cache the babel transform plugin, passing the cache-wrapped transform to rollup (e.g. plugins: [cache(babel()) ]). This way just the babel transform gets cached, which means even if rollup caches nothing, we still save time (our benchmarks show significant time) by caching these transforms.

@lukastaegert
Copy link
Member

I think transforms are definitely the best target for caching. But maybe we should consider adding some way of transforms to signal that they are "pure". As the most important plugins are part of the rollup organization, it would be easy to add such a flag to them.

Then ideally if all transforms are pure and we now the original file has not changed, we could cache the result of applying all transforms + parsing as a stringified object. Not sure if file-size is a problem, might make sense checking if some form of compression could even improve the speed here.

The actual "rolling-up" is hard to cache but it's easy to hydrate the process with an existing AST.

I guess @guybedford has an opinion on this as well, definitely sounds like something we should do. As for watch mode, there is also still a bit of potential for speed improvement left. Currently, we re-create our AST from acorn's result on each run while the architecture would already support resetting and reusing the existing internal data structures.

@guybedford
Copy link
Contributor

Agreed we should definitely move towards doing this. First under an option, then hopefully we can provide it by default for 1.0.

What do people think about the default cache folder? .cache/rollup? What are the current trends here?

@TrySound
Copy link
Member

node_modules/.cache/rollup was introduced by sindresorhus. Looks good and not requires to add it to gitignore.

@keithamus
Copy link
Contributor

I would generally give a 👎 for node_modules/.cache/rollup as its likely that most CIs will throw away node_modules for each build. Some libraries use npm's cache folder (typically ~/.npm), which can be retrieved by looking at process.env['npm_config_cache'] - I'd be happy if we checked that and perhaps defaulted it to ./node_modules/.cache (i.e. cacheDir = path.resolve(`${process.env['npm_config_cache'] || './node_modules/.cache'}/rollup`))

@dazinator
Copy link

dazinator commented Nov 2, 2018

I am currently using rollup via it's API, and all my inputs are held in the node process in memory (piped in from .NET over http). I use https://github.com/Permutatrix/rollup-plugin-hypothetical for the in memory file store.

From reading this, if rollup is going to be creating its own persistent cache, i'd like to be in control of the persistence myself rather than rollup writing directly to the file system somewhere. In my case I'd likely shuffle the cache back over to the .NET process, for persistence somewhere within the asp.net core website itself. Implied in that, is the ability to prime the cache myself, in the case of a cold start, rather than rollup looking at the filesystem directly. Essentially I'd like the read and write cache method to be pluggable

@shellscape
Copy link
Contributor

Hey folks. This is a saved-form message, but rest assured we mean every word. The Rollup team is attempting to clean up the Issues backlog in the hopes that the active and still-needed, still-relevant issues bubble up to the surface. With that, we're closing issues that have been open for an eon or two, and have gone stale like pirate hard-tack without activity.

We really appreciate the folks have taken the time to open and comment on this issue. Please don't confuse this closure with us not caring or dismissing your issue, feature request, discussion, or report. The issue will still be here, just in a closed state. If the issue pertains to a bug, please re-test for the bug on the latest version of Rollup and if present, please tag @shellscape and request a re-open, and we'll be happy to oblige.

@guybedford
Copy link
Contributor

I still think a CLI default experience here would be useful, and I even started work along these lines in #2397 before getting pulled in other directions.

Happy to let new contributions pick this one up though.

@lukastaegert
Copy link
Member

My current ambitions, should time permit, would be to work towards a plugin interface for “cache providers” with the possibility that the Rollup CLI adds a default plugin. This would be a very clean solution that works in any environment and would allow for very powerful implementations without putting the development load on Rollup core alone. But there is already a more recent issue about this so this can remain closed from my side.

@lukastaegert
Copy link
Member

Much of this is driven by the requirements of StencilJS, a partnership I really appreciate.

@shellscape
Copy link
Contributor

OK let's reopen and track then.

@frank-dspeed
Copy link
Contributor

I am Thinking about building a enterprise rollup product something like rollup-enterprise that is able to work like a online build chain that runs always and keeps all state in our fast in memory data stores with replication this way we could offer fastest build and deployment pipelines possible.

@Thrilleratplay
Copy link

Not a bug, but a curiosity why a hamfisted implementation of this feature would not work.

For some background, the project I am working on is in process of migrating a legacy Angular.js/Gulp3 code base. For compatibility reasons, most of the gulp tasks are left as is including file watchers and module injection. Passing a variable between gulp tasks does not seem doable, so willing to take the syncrounse read writes, I set up a gulp task that is essentially this:

    const rollup = require('rollup');
    const fs = require('fs');

    const rollupConfig = require('rollup.build.config.js');
    const rollupCacheFilePath = '.rollupCache';

    if (fs.existsSync(rollupCacheFilePath)) {
      rollupConfig.cache = JSON.parse(fs.readFileSync(rollupCacheFilePath));
    }

    return rollup.rollup(rollupConfig).then((bundle) => {
      fs.writeFileSync(rollupCacheFilePath, JSON.stringify(bundle.cache));
      return bundle.write(rollupConfig.output);
    });

It seems to work, the cache is recognized and only changed files are processed, but then it locks up. No file is written and the promise does not return with a successfully or with an error. Is there a reason why this would not work or did I miss a list of known plugins that have issues with rollup cache?

@lukastaegert
Copy link
Member

My suspicion would be on rollup-plugin-commonjs as it relies on the transform hook being executed at least once for each module to determine if it is CommonJS. If a cached file is used instead, the promise will never complete: https://github.com/rollup/plugins/blob/07f325de8978ab0f0ff8a2befc23f898ff33eee3/packages/commonjs/src/index.js#L164

But this is a good point because it means:

  • Yes, there can be interesting plugin incompatibilities, and
  • We need to implement adequate caching for plugins first

@Thrilleratplay
Copy link

@lukastaegert Thank you for the quick response and that makes sense. I am using rollup-plugin-commonjs and that explains the behavior I was seeing.

@frank-dspeed
Copy link
Contributor

only my 2 cent while i in general love the hook api i think it would be more clear and great to change that

We need load and resolve but transform should be applyed in a extra step so that we always know all files are emitted and resolved. and then apply transform in a seperated step if needed.

That means we should think about async and parallel hooks we need maybe a serial hook additional

to be more clear we should add a end Hook or final and that should be sync while we can then code plugins for that final hook to be executed in workers if needed.

@lukastaegert
Copy link
Member

Things are not that simple, especially for the transform hook. rollup-plugin-commonjs needs to hook into it to determine if something is CJS because it is entirely possible that there are other transformers before it (e.g. Babel, TypeScript) that are needed to make the code actually parseable JavaScript. There is really no advantage in postponing the transform steps except it will make things slower.

BTW final hooks exist for all phases, buildEnd for the build phase and generateBundle/writeBundle/ renderError for the generate phase. Not sure what making any of the sync would accomplish except making it impossible to do async things in those hooks. Also note that many hooks are marked as "sequential" for predictability, such as generateBundle.

@frank-dspeed
Copy link
Contributor

oh i was not so deep into this so we got already all needed infrastructure i think i should deep dive into that and create a userland implamentation of a stream and cash able result.

As you pointed out the hooks do exist that are needed for this so its a clear thing we need a plugin that creates a dependency graph a complet one that is cache able.

then we need something that catches all resolveId and load calls and only emits the once related to the changed files.

then all plugins can run as expected because they don't need to be aware of the outer scope.

Conclusion

the more i am thinking about this i am sure rollup should be refactored to run as daemon and when we emit files it should handle them right. I will do a PoC.

@pranaypratyush
Copy link

New js user here. I had been relying on parcel day and night because of their zero-config feature. I just made the decision to move to rollup. After I am all set I noticed there's no persistent caching. Had just gone out and assumed it would be a standard thing. 😝
Keep up the good work guys. I am planning to use rollup for an SSR web-app targeted at old and cheap phones and possibly piss poor data connections in suburban areas.

@vvo
Copy link

vvo commented Jul 21, 2020

Hi, is there any workaround to feed rollup with some cache that was synced on disk? Would saving bundle.cache somehow to the disk (writeFile + eval? :D) and then reading it back would work? Thanks!

@frank-dspeed
Copy link
Contributor

@vvo if you find a good way to serialze circular references then maybe.

@lukastaegert
Copy link
Member

The cache should not contain any circular references as it is explicitly created to be JSON.stringifyable. If you do not use @rollup/plugin-commonjs I would actually expect it to work in most setups (i.e. write JSON.stringify(cache) to disk and feed it back into the system via JSON.parse(...))

@simonwep
Copy link
Contributor

Is there any update on this? A cold rebuild takes ~40s whereas with webpack (which I cannot use because of #2933) + file-system caching it only takes ~300ms after changing one file.

@burdiyan
Copy link

This is my biggest frustration with all the JS build tools. I can't be happy when it takes seconds to build a project that I've already built and haven't changed anything.

@robertknight
Copy link

robertknight commented Nov 1, 2021

I've been looking into disk-based caching options to speed up test runs and dev server startup for the product I work on. I have a functional but immature solution at https://github.com/robertknight/rollup-cache. The aim is to make it easy to drop into an existing project with minimal configuration.

It currently enables caching for the resolveId, load and transform build hooks of several official Rollup plugins where I've found it to work well and provide a significant speed-up: commonjs, node-resolve, babel. There is also a feature that enables easy pre-building of npm dependencies as separate bundles in development. This speeds up rebuilds by reducing the amount of code that Rollup has to parse, analyze and serialize each time the bundle is built, independent of any transforms. Conceptually this is similar to shared libraries/DLLs in native apps or Webpack's DllPlugin plugin.

I did also look at caching the results of acorn's JS parsing, although when using naive JSON serialization of the AST, it didn't offer significant improvement over just re-parsing the input code.

@frank-dspeed
Copy link
Contributor

frank-dspeed commented Feb 5, 2022

i am confused and more then that i did run the pwabuilder from microsoft it used rollup and created on watch mode a ".rollup.cache" on disk maybe some one has it working

the content of the .rolllup.cache folder is amazing it is better then the final bundle! @lukastaegert is that rollup cache from us or is that from microsoft.

in the config is nothing that looks like that

import resolve from "@rollup/plugin-node-resolve";
import html from "@open-wc/rollup-plugin-html";
import copy from "rollup-plugin-copy";
import replace from "@rollup/plugin-replace";
import typescript from "@rollup/plugin-typescript";

export default {
  input: "index.html",
  output: {
    dir: "build",
    format: "es",
    sourcemap: true
  },
  plugins: [
    resolve({
      exportConditions: ['development']
    }),
    html(),
    typescript({
      tsconfig: "tsconfig.dev.json",
    }),
    replace({
      "preventAssignment": true,
      "process.env.NODE_ENV": JSON.stringify(
        process.env.NODE_ENV || "production"
      )
    }),
    copy({
      targets: [
        { src: "assets/**/*", dest: "build/assets/" },
        { src: "styles/global.css", dest: "build/styles/" },
        { src: "manifest.json", dest: "build/" },
      ],
      copyOnce: true
    }),
  ],
};

i used that template https://github.com/pwa-builder/pwa-starter

when you then do npm run dev

the magic happens

@frank-dspeed
Copy link
Contributor

@robertknight you should not care for the speed in the first implementation.

I am working on parse5 you maybe know it a complet DOM parser written in JS there are tons of small tricks that we can apply once the final implementation is solid.

simply write the code that is most read and understand able for you and others later we can revisit that and implement some low level tricks like translating the loops and working with strings and other parsing algos.

@tigt
Copy link

tigt commented Feb 5, 2022

node_modules/.cache/rollup-cache/ looks like it comes from robertknight/rollup-cache

@frank-dspeed
Copy link
Contributor

@tigt no my does not come from that it got created in project root ./.rollup.cache/ it contains clean ESM Code not transformed but correct splitted already.

for me as some one who codes only with ESNext Targets and only the need to transpil up to ESNext this is a dream output without the none needed obfusications as my environment is already ESNext.

@EricWu91
Copy link

EricWu91 commented Apr 8, 2024

Hello! It's been a while since the last comment in this issue. What can we do as a workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

17 participants