Skip to content
This repository has been archived by the owner on Nov 26, 2021. It is now read-only.

IDEA: Support cached mode for plugin so metalsmith builds can be faster when we know the content hasn't changed #49

Open
jideshv opened this issue Mar 7, 2017 · 10 comments

Comments

@jideshv
Copy link
Contributor

jideshv commented Mar 7, 2017

Hi Team, first of all amazing plugin. As we get more and more dependent on this plugin one area that has been a bit frustrating is the time it takes to pull data from contentful (we are pulling 400+ objects) even if the data hasn't changed. Was wondering how difficult it would be to allow a cached mode of the plugin. Thoughts on this? I was considering forking and adding this feature. Would there be interest from others?

@Khaledgarbaya
Copy link
Contributor

hi @jideshv,
I think this would be an awesome feature. we can make use of the sync api maybe?
I would love some more inputs from @stefanjudis and PRs are welcome also

Best,
Khaled

@stefanjudis
Copy link
Contributor

hey @jideshv,

I'm up totally for that. 👍 It would save requests and speed up development flow. That's a great idea!

I don't think it would be that difficult. I'm just thinking out loud here.

// from my build.js 
  .use( contentful( {
    space_id     : config[ mode ].SPACE_ID,
    access_token : config[ mode ].TOKEN,
    host         : mode === 'PREVIEW' ? 'preview.contentful.com' : 'cdn.contentful.com',
    // we could add cache property here
    // which defines a caching folder maybe?
    cache       : '.contentful-cache'
  } )

And then we could wrap client.getEntries ( https://github.com/contentful/contentful-metalsmith/blob/master/lib/processor.js#L165 ) to read from disk if there is a cache defined in the config. That would be awesome!

I was considering forking and adding this feature. Would there be interest from others?

Totally! 👍

@Khaledgarbaya
Copy link
Contributor

Khaledgarbaya commented Mar 7, 2017

@stefanjudis I like the approach but the question now is when do we invalidate the cache ?

@stefanjudis
Copy link
Contributor

@Khaledgarbaya Hmm good question...

cache : {
  // drops all existing caches and creates new ones
  invalidate : true,
  folder     : '.contentful-cache'
}

Let me just think out loud here.

Production build

Nothing's gonna change here.

.use( contentful( {
  host  : 'cdn.contentful.com',
  // always deal with fresh api data
  cache : null
} ) )

Development flow

This mode would fill up the cache but also serve cached results if available.

.use( contentful( {
  host  : 'cdn.contentful.com',
  // takes data from the cache if available
  // writes new data to disk
  cache : {
    invalidate : false,
    folder     : '.contentful-cache'
  }
} ) )

Development flow invalidation

This mode drops everything initially and then write all data to cache.

.use( contentful( {
  host  : 'cdn.contentful.com',
  // takes data always from the network
  // writes new data to disk
  cache : {
    invalidate : true,
    folder     : '.contentful-cache'
  }
} ) )

Does this make sense?

@larrybotha
Copy link

I'm guessing everyone's moved on to other projects perhaps, and feeling less of a burn when rebuilding. Has anyone made any progress on this?

@stefanjudis
Copy link
Contributor

Hey @larrybotha, unfortunately, I've to say that we won't be able to tackle this in the near future. :( But I could support you in case you want to tackle it. :)

@larrybotha
Copy link

@stefanjudis thanks! I do need to level up my node skills, so this would be a good way to both get some practise there, and learn more about the Contentful API. Got a bunch of things on my learning road map, and the next couple weeks are nuts, but if I find time between other goals I may just give it a bash.

Don't hold thumbs, future people who come this way! I am known to do what I say, but also to not do what I say, too.

@leviwheatcroft
Copy link

This is my first look at contentful so I might be way off base.. but I have a few thoughts.

@stefanjudis re: invalidating cache, it would be frustrating if the only way to invalidate the cache was with a specific call like cache: { invalidate: true }. I see entry objects have an updatedAt field.. does that mean the query builder could add a query like updatedAt > lastRun ?

This kind of approach means you still have a single api request per build, but for most use cases that's a reasonable trade off.

@Khaledgarbaya re: sync api, looking at the docs about the sync api, it seems like it would be quite difficult to implement in this plugin. That said, the subsequent calls to the sync api get "delta updates", and while I understand the concept, not knowing what those deltas might look like means IDK what applying them would mean on a practical level.

metalsmith-cache (self plug sorry) might be a good fit for actually storing and retrieving cached files, pretty much just a wrapper around lokijs to make it play well with metalsmith. Storing and retrieving files is straight forward (per readme). Merging from cache to your metalsmithfiles structure is as easy as files.concat(fileCache.all()).

@stefanjudis
Copy link
Contributor

re: invalidating cache, it would be frustrating if the only way to invalidate the cache was with a specific call like cache: { invalidate: true }. I see entry objects have an updatedAt field.. does that mean the query builder could add a query like updatedAt > lastRun ?
This kind of approach means you still have a single api request per build, but for most use cases that's a reasonable trade off.

When I get your approach correctly, this means depending on how many resources you include you'd have to make one API call per resource to get the updatedAt information. So at the end there would be no benefit on comparing these fields as the build time stays the same, because it'd still make all the calls. I believe that there is no way around the fact that a human has to decide when to use cached data and when to invalidate the cache.

And usually (at least for my case) this is exactly what I need. I want to develop a site and I know that the data didn't change (or I don't care) – I only care about fast build times and independence of the API in development.

I'd totally be open for metalsmith-cache if it does the job. ;)

@leviwheatcroft
Copy link

I take your point. Thinking through this a little more it's starting to get complicated.

I guess there's two types of changes contentful-metalsmith makes. Firstly the files which are created by this plugin with entry_templates like this, and secondly the files which are effected by having data injected like this.

This first case for created files is easy.. you write every file created to a cache, and just retrieve all the cached files every build. It's this type of query which filtering by > updatedAt would speed up significantly.

The second case for effected files is much more difficult. You can't just cache the state of a file after this plugin has effected it, because when you return it later on it will be uneffected by any other plugins earlier in the build process which have changed that file in some way. So for this case you need to cache queries rather than files, which is more complex.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants