Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable browser/webkit caching ? #203

Open
nwohaibi opened this issue Apr 12, 2015 · 3 comments · May be fixed by #339
Open

disable browser/webkit caching ? #203

nwohaibi opened this issue Apr 12, 2015 · 3 comments · May be fixed by #339

Comments

@nwohaibi
Copy link

Hi,
Thanks for the wonderful work on Spalsh

I just wanted to know if there is any way to disable browser caching of files?
Or maybe return all HTTP requests made in har/log/entries, not just the ones with 200 http status ?

thanks in advance

@kmike
Copy link
Member

kmike commented Apr 12, 2015

Hi @nwohaibi,

Thanks!

I just wanted to know if there is any way to disable browser caching of files?

There is a way to do it in QWebKit (see http://doc.qt.io/qt-4.8/qnetworkrequest.html#CacheLoadControl-enum), but currently this option is not exposed by Splash. It is a good feature to have, but we need to design a public API for it and implement it.

Or maybe return all HTTP requests made in har/log/entries, not just the ones with 200 http status ?

HAR entries already contain all HTTP requests, not just the ones with 200 http status code. In case of cache some records may be missing because they are not requested at all. It should be possible to add them to the output as well, but I haven't checked the details; implementation may be not so straightforward.

@nwohaibi
Copy link
Author

Thanks for taking the time to clarify :)
Since I already have Splash in production, i might tackle the issue by modifying cache-control headers in HTTP responses. This way, WebKit would assume all resources are not to be cached.
let me know if I can be of any help
and thanks again

starrify added a commit to starrify/splash that referenced this issue Nov 19, 2015
@starrify
Copy link
Member

Hi @kmike

There is a way to do it in QWebKit (see http://doc.qt.io/qt-4.8/qnetworkrequest.html#CacheLoadControl-enum), but currently this option is not exposed by Splash.

I used to believe that, and I even tried to make a PR that way. However later I realized that it is not the case. (Proved by local testings)

The QNetworkRequest::CacheLoadControl attribute shall be set for request instances, and it is Qt's network manager to decide whether to use a disk cache. However in the current implement of splash, caching in the network managers is not enabled at all (please check https://github.com/scrapinghub/splash/blob/master/splash/network_manager.py#L42)

As WebKit also has its own in-memory cache (for scripts, stylesheets, images, etc.), that is believed to be the real cause. In some specific scenarios it's required to strictly disable any kind of caching. Thus I made PR #339 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants