Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve caching strategy #56

Open
mco-gh opened this issue Jun 30, 2016 · 6 comments
Open

Improve caching strategy #56

mco-gh opened this issue Jun 30, 2016 · 6 comments

Comments

@mco-gh
Copy link

mco-gh commented Jun 30, 2016

I love this app, thanks for building it! But it's missing the mark in an important way. Let me explain...

For me, HN links have a binary utilization profile, by which I mean that I generally click a link exactly zero or one times. Caching on access is useless for me because I don't tend to revisit previously read articles. And the lack of pre-caching means I get the dreaded dinosaur whenever I click on a link I've never visited.

What I'd really like is the following strategy: for every link presented, pre-fetch and cache the article for that link, so that when I'm on the train later and I'm reviewing the front page downloaded earlier, I can click a link and have a true offline "magic moment", where I'm presented with content I've never visited before.

I realize this will cause a high level of cache utilization, but I think it's ok to have a fairly short time to live for these pre-cached items. The HN front page changes frequently enough that the links for the top articles will most likely be gone within 24-48 hours so these pre-cached items can be purged fairly aggressively.

This will also burn a lot of network utilization, which might be impolite for people with limited and/or expensive data plans, but that could be addressed by having a config setting to selectively enable this pre-caching policy.

@mco-gh
Copy link
Author

mco-gh commented Jun 30, 2016

p.s. I'm interested in working on a PR for this feature if it's considered a worthwhile thing to add.

@addyosmani
Copy link
Collaborator

addyosmani commented Jul 3, 2016

I talked to @marcacohen about this feature a little offline. In short, the suggestion here is an opt-in mode for pre-fetching the first N external articles from the top stories page for offline reading.

This is an interesting problem that could be tackled in a few different ways:

I) Server-side 'readability' proxy We would need to self-host a service that takes a URL (a top story item), fetches just the article content and just returns a stripped down version. The user experience would be similar to what you get with Pocket. We would pre-fetch/offline these articles either on user-trigger or via requestIdleCallback or the Background Sync API and just delegate to SW Toolbox the runtime caching side of offlining. Self-hosting is necessary due to the fun that is CORS. If I was starting with 1, I might use https://www.npmjs.com/package/node-readability (or similar) which looks like:

var read = require('node-readability');

read('http://www.nature.com/news/rats-free-each-other-from-cages-1.9603', function(err, article, meta) {
  // Main Article 
  console.log(article.content);
  // Title 
  console.log(article.title);
  // Close article to clean up jsdom and prevent leaks 
  article.close();
});

II) Fetch via Service Worker We might be able to accomplish this entirely client-side by intercepting URLs from the top stories page via Service Worker, however, we may end up needing to wait on foreign fetch support to land depending on how complex this ends up being in practice. Service Worker can opaquely cache resources for offline, but this is somewhat limited. Fully offline functionality for intercepting requests from anywhere to resources in our scope would require FF afaik. We may end up being even more limited if FF is limited to only secure contexts.

I've tried playing around with link rel=preload + SW for this use cache but wasn't able to get far. @marcacohen, perhaps you could pick an approach you're keen on exploring and see how far you can get? As mentioned in person, my main concern isn't that we can't figure out a way to get the feature in place but more the server-side costs if this ends up being something that has to be done on GAE and might be expensive as the project grows.

@addyosmani
Copy link
Collaborator

Did some research and found https://github.com/n1k0/readable-proxy/. Looks like we could fork and add SW support to test out your idea.

@mco-gh
Copy link
Author

mco-gh commented Jul 4, 2016

Jake Archibald also pointed me to https://github.com/premii/hn as another
example to learn from.

On 4 July 2016 at 00:47, Addy Osmani notifications@github.com wrote:

Did some research and found https://github.com/n1k0/readable-proxy/.
Looks like we could fork and add SW support to test out your idea.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#56 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAoLl9GZUOcZrw53bJexIun50_78zlhDks5qSEn6gaJpZM4JCBiI
.

Marc Cohen
mco@google.com
http://about.me/marc1

@addyosmani
Copy link
Collaborator

@marcacohen After working on some more proof-of-concepts here, I think we could probably knock out this feature in under a week (same time if we were to guide you through it) 🎀 Are you still interested in working on it?

@mco-gh
Copy link
Author

mco-gh commented Jul 22, 2016

Hi Addy,

Sorry for the delay. I'm up for taking a crack at this. Do you have any
refs you can point me to, or should we have a VC to cover your guidance
verbally?

Marc

On 16 July 2016 at 17:28, Addy Osmani notifications@github.com wrote:

@marcacohen https://github.com/marcacohen After working on some more
proof-of-concepts here, I think we could probably knock out this feature in
under a week (same time if we were to guide you through it) 🎀 Are you
still interested in working on it?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#56 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAoLl5Tx9-GU4lcQwe1Jxg39ICxX7c1uks5qWQbBgaJpZM4JCBiI
.

Marc Cohen
mco@google.com
http://about.me/marc1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants