Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block certain cookies or cookie values #447

Open
JustAnotherArchivist opened this issue Jun 6, 2020 · 1 comment
Open

Block certain cookies or cookie values #447

JustAnotherArchivist opened this issue Jun 6, 2020 · 1 comment

Comments

@JustAnotherArchivist
Copy link
Contributor

Some cookies or cookie values have bad effects on the archival. For example, many classical forum softwares let the user choose between different view modes (linear, threaded, hybrid), styles, or languages, but to get a representative archive, we'd only want the default presentation. These things are usually stored in cookies (not the session information, but actual separate cookies). There should be a way to block certain cookies entirely (i.e. they're never stored and sent back on later requests) or to prevent setting certain cookie values (i.e. if a server tries to set it to something else, that's ignored).

The most flexible solution would be to have pairs of a name pattern and a value pattern; if both match a cookie sent by the server, it gets ignored. For cookies we want to ignore entirely, the value pattern could then just be ^ or an empty pattern (which could also be optimised and bypass the regex engine entirely, of course), but it would also allow for pretty much any restriction on the values.

The block list would be stored on the control node and retrieved by or pushed to the pipeline on launching a job, similar to URL ignores but without changes while the job is running.

Example: bb_threadedmode (e.g. job f0i5kb7nl4ltumlaj2wrnptrk)

@manu-cyber
Copy link

Another example: vBulletin (e.g. job 86ox20zlr2p59av0w6sau3zzu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants