Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paywall sites to add #3439

Open
15 of 21 tasks
gnppn opened this issue Nov 26, 2017 · 64 comments
Open
15 of 21 tasks

Paywall sites to add #3439

gnppn opened this issue Nov 26, 2017 · 64 comments

Comments

@gnppn
Copy link

gnppn commented Nov 26, 2017

Here's a quick list of sites I think are worth adding to wallabag's paywall function:

  • contexte.com
  • electronlibre.info (Wordpress)
  • canardpc.com
  • gamekult.fr
  • lequatreheures.com
  • liberation.fr
  • limprevu.fr
  • politis.fr
  • reflets.info
  • lequipe.fr
  • lwn.net
  • economist.com
  • nytimes.com
  • ft.com
  • lepoint.fr
  • theathletic.com
  • alternatives-economiques.fr
  • prospectmagazine.co.uk
  • ihned.cz
  • aoc.media
  • courrierinternational.com
@nicosomb
Copy link
Member

About reflets.info, I sent them an email because they added some security on their login page.

I update your post with some websites that I received in my mailbox.

@er-vin
Copy link

er-vin commented Dec 17, 2017

I would definitely welcome a few of those to be supported. Especially reflets.info it's my numer 1 missing now that the Diplo is supported. :-)

Thanks a lot for the paywall support by the way, definitely welcome addition.

@apiontek
Copy link

For lwn.net, I added the following to the file at vendor/j0k3r/graby-site-config/lwn.net.txt and it's been working for me:

requires_login: yes

login_uri: https://lwn.net/login
login_username_field: Username
login_password_field: Password

not_logged_in_xpath: /html/body/div[3]/div[1]/form[@class="loginform"]

@apiontek
Copy link

I was trying to make nytimes work, and I think the correct info would be:

requires_login: yes

login_uri: https://myaccount.nytimes.com/svc/account/auth/v1/login
login_username_field: username
login_password_field: password

not_logged_in_xpath: //a[contains(@href, '/auth/login')]

...so I added it to the bottom of vendor/j0k3r/graby-site-config/nytimes.com.txt -- but it doesn't work. I get an article in wallabag with content: "wallabag can't retrieve contents for this article. Please troubleshoot this issue." -- and the article URL is the login_uri

Not sure about this but I'm guessing the issue is either

  1. the POST payload is supposed to include "auth_token" and "remember_me," maybe it doesn't work without that
  2. maybe I have the not_logged_in_xpath wrong
  3. maybe after it submits the login, wallabag is expecting the returned page to contain the full article? but there's no returned page or something?

@apiontek
Copy link

For economist.com, the following is working for me:

requires_login: yes

login_uri: https://www.economist.com/user/login
login_username_field: name
login_password_field: pass

not_logged_in_xpath: //*[@id="user-login-masthead"]/div[@class='login-form']

@MonsieurPoutounours
Copy link

Anyone working on Gamekult ? I can't manage to get it work ...

@thibaultamartin
Copy link

thibaultamartin commented Aug 17, 2018

@nicosomb did you manage to get an answer from reflets.info?

Edit: Does anyone have clues we can dig to help supporting those sites?

@Simounet
Copy link
Member

@thibaultamartin reflets.info is working right now.
@apiontek Thanks for your help. I did a PR on the ftr-site-config repository with your configurations: fivefilters/ftr-site-config#524
@MonsieurPoutounours I'm trying to make GK work but encountering an issue to make my account validated.

@er-vin
Copy link

er-vin commented Aug 28, 2018

When you say reflets.info is working, you don't mean in the released version yet, right? Because it's still not working for me with 2.3.2. Or I missed something?

@Simounet
Copy link
Member

@er-vin You should have that into vendor/j0k3r/graby-site-config/reflets.info.txt.

@MonsieurPoutounours
Copy link

@Simounet I committed the fix for Gamekult to graby-site-config repository

@Simounet
Copy link
Member

@MonsieurPoutounours
Copy link

@Simounet I just wrote the pull request :-)

@Simounet
Copy link
Member

Can't wait to check that!

@MonsieurPoutounours
Copy link

It changed my life ;-)

@Simounet
Copy link
Member

Can you link your PR to this issue please?

@Kdecherf
Copy link
Member

Kdecherf commented Sep 1, 2018

Done 👍

@MonsieurPoutounours
Copy link

@Kdecherf Thank you. I forgot to do it.

@thibaultamartin
Copy link

@Simounet Reflets.info does not seem to work here, although I have my credentials registered.

I'm using wallabag.it (I could self-host, but I want to support the team for making this great software). Are those ftr files deployed regularly on wallabag.it?

@Simounet
Copy link
Member

I'm not sure. Ping @nicosomb .

@j0k3r
Copy link
Member

j0k3r commented Sep 12, 2018

@thibaultamartin no they are not.
wallabag.it is still running on wallabag 2.3.2 (we need to update it). Which means it use the version 1.0.45 of site configs.
reflet credentials are in 1.0.47.

@Kdecherf
Copy link
Member

Hello world,

Quick update on this issue and its checklist. I think some unchecked paywalls are now supported by graby-site-config, can someone test them?

  • contexte.com: login directives are present since 1.0.58
  • canardpc.com: login directives are present since 1.0.37
  • lequatreheures.com: login directives are present since 1.0.58
  • politis.fr: login directives are present since 1.0.56

@Simounet
Copy link
Member

I can testify that canardpc.com is working.

@techexo
Copy link
Contributor

techexo commented Dec 19, 2018

Not in the starting list, but I added support for lepoint.fr with PR fivefilters/ftr-site-config#581. Support will be effective probably next release.

I don't have credentials to test other websites, however.

@biva
Copy link
Contributor

biva commented Jan 2, 2019

Is it possible to add https://www.alternatives-economiques.fr/ in this list? Should I create a dedicated issue? On https://github.com/j0k3r/graby-site-config/issues ?
I have credentials to test, but not able to code :(

@MonsieurPoutounours
Copy link

@pakman Paywall support is done for prospectmagazine.co.uk. I created a pull request. As @j0k3r told us, this change may be reflected in wallabag.it instance quickly.

@j0k3r
Copy link
Member

j0k3r commented Jan 14, 2019

As soon as a new release is done of wallabag, not site-config :)

@pakman
Copy link

pakman commented Jan 14, 2019

@MonsieurPoutounours @j0k3r Many thanks to you both! I'll be patient and wait until wallabag.it pick up the change.

@regagain
Copy link

I hope it's ok to use this issue to suggest more sites.

It would be nice to have http://dn.se/ (one of the biggest Swedish newspapers).

@jummo
Copy link

jummo commented Mar 9, 2019

Heise.de (c't) got a new online service (Heise+) and with the following adjustments to heise.de.txt I could save articles which requires login.

requires_login: yes

login_uri: https://www.heise.de/sso/login/login
login_username_field: username
login_password_field: password

not_logged_in_xpath: //body[@Class="a-login__link a-login__link--sso"]

Thanks for this great product!

@Kdecherf
Copy link
Member

@jummo great, you can send a PR to the following repository to suggest your change: https://github.com/fivefilters/ftr-site-config

@jummo
Copy link

jummo commented Mar 10, 2019

@Kdecherf Thanks for the hint => fivefilters/ftr-site-config#630

@thornick
Copy link

am not able to fetch articels from https://www.thetimes.co.uk with login
unfortunately I can't figure out how to make this work.

body: //article[@id='article-main']

requires_login: yes

login_uri: https://login.thetimes.co.uk
login_username_field: username
login_password_field: password

not_logged_in_xpath:  ???

@techexo
Copy link
Contributor

techexo commented Mar 18, 2019

The not_logged_in_xpath corresponds to an XPath that is valid only when you're not connected and/or don't have access to the article. Most of the time it corresponds to a div containing the form to register/purchase a subscription or something of the kind. To see how XPath are formed, you can have a look at the doc : https://doc.wallabag.org/en/user/errors_during_fetching.html#basics-of-xpath-10

@TheNomad11
Copy link

Is it possible to add morgenbladet.no ? Norways' leading weekly. Thanks

@j0k3r
Copy link
Member

j0k3r commented Apr 17, 2019

I've added theathletic.com (see fivefilters/ftr-site-config@8947d38).
Thanks @SixthStreet for the login info.

@j0k3r
Copy link
Member

j0k3r commented Apr 17, 2019

Also, if someone wants a new paywall website to be added, drop me:

  • your login / password
  • a link to a restricted page
  • a link to the login page

@j0k3r
Copy link
Member

j0k3r commented Apr 23, 2019

Added ihned.cz (see fivefilters/ftr-site-config#639)

@bdunogier
Copy link
Contributor

bdunogier commented May 12, 2019

You may wanna add https://aoc.media to the list. You can subscribe for free to get access to 3 articles/month.

@j0k3r
Copy link
Member

j0k3r commented May 13, 2019

@bdunogier fivefilters/ftr-site-config@94fad82 done

@X-dark
Copy link
Contributor

X-dark commented Oct 11, 2019

I'm trying to add inpact-hardware.com but it seems the login is not submitted as a form but as a json payload.

@and0uille
Copy link

Same here, I'm trying to update the arretsurimages.net paywall config, but it is also submitted as json to api.arretsurimages.net/oauth/v2/token

@Pofilo
Copy link

Pofilo commented Feb 17, 2020

@X-dark did you manage to add inpact-hardware.com ?

@Pofilo
Copy link

Pofilo commented Feb 18, 2020

By the way, what is the process to ask for a new paywall ? Is it here, or there https://github.com/j0k3r/graby-site-config ?

And I don't really understand how to create it. I found this doc but don't really understand how to make it work for inpact-hardware.com.

@X-dark
Copy link
Contributor

X-dark commented Feb 18, 2020

@X-dark did you manage to add inpact-hardware.com ?

No did not spent much time on it. Sorry.

@shtrom
Copy link
Contributor

shtrom commented Mar 8, 2020

I'm trying to add cacm.acm.org, and got a site-config that looks like it's doing something

http_header(referer): https://cacm.acm.org                                                                                                                               
requires_login: true
not_logged_in_xpath: //input[@id='userNameInPage']
login_uri: https://cacm.acm.org/login
login_username_field: current_member[user]
login_password_field: current_member[passwd]

Without credentials configured, I get this in the graby debug logs (https://doc.wallabag.org/en/user/errors_during_fetching.html#enabling-debug-logs-self-hosting), which is encouraging.

[2020-03-08 03:43:35] graby.DEBUG: Auth: no credentials available for host. {"host":"cacm.acm.org"} []                                                                   

However, after actually adding my credentials (and auth enabled in the config), all I get is

error code: 1020

in the fetched data, with nothing much that is helpful in the log as to what happened.

[2020-03-08 04:00:07] graby.DEBUG: Auth: add parameters. {"host":"cacm.acm.org","parameters":{"host":"cacm.acm.org","requiresLogin":true,"loginUri":"https://cacm.acm.org/login","usernameField":"current_member[user]","passwordField":"current_member[passwd]","extraFields":[],"notLoggedInXpath":"//input[@id='userNameInPage']","username":"**masked**","password":"**masked**"}} []
[2020-03-08 04:00:08] graby.WARNING: Request throw exception (with a response): Client error response [url] https://cacm.acm.org/login [status code] 403 [reason phrase] Forbidden {"error_message":"Client error response [url] https://cacm.acm.org/login [status code] 403 [reason phrase] Forbidden"} []

I have checked that I can use the credentials to log in manually to the site. One thing is that, once entered, the credentials are still not visible in the UI, but there are some seemingly encrypted strings in the DB.

@shtrom
Copy link
Contributor

shtrom commented Sep 16, 2020

Ok, the 1xxx errors have nothing to do with Wallabag or Graby. Those are Cloudflare errors https://support.cloudflare.com/hc/en-us/articles/360029779472-Troubleshooting-Cloudflare-1XXX-errors#error100610071008

@maniseb
Copy link

maniseb commented Nov 4, 2020

Someone can help me to do https://clubigen.fr ?
I've tried without any success.

Of course, I can provide credentials if needed.

@Aquan1412
Copy link

Hello, I'm trying to get sueddeutsche.de to accept my login credentials.
I tried adjusting the Graby config file, but I can't get it to work...

Here's my current adjustment to the config file:

requires_login: yes
login_uri: https://id.sueddeutsche.de/login
login_username_field: login
login_password_field: password
not_logged_in_xpath: /html/body/header/div[2]/ul[2]/li[1]/a

I guess the not_logged_in_xpath is wrong. maybe somebody could help with getting it right? I could also provide credentials for testing.

@Simounet
Copy link
Member

Hello, I'm trying to get sueddeutsche.de to accept my login credentials.
I tried adjusting the Graby config file, but I can't get it to work...

Here's my current adjustment to the config file:

requires_login: yes
login_uri: https://id.sueddeutsche.de/login
login_username_field: login
login_password_field: password
not_logged_in_xpath: /html/body/header/div[2]/ul[2]/li[1]/a

I guess the not_logged_in_xpath is wrong. maybe somebody could help with getting it right? I could also provide credentials for testing.

Hi,
You can try //form[@id="login-form"]//span[@class="help-block"]. Let us know if it worked.

@Aquan1412
Copy link

Hi, thanks for the quick reply, unfortunately it didn't help.

@j0k3r
Copy link
Member

j0k3r commented Nov 28, 2020

I'll lock that conversation and if anyone want to help or want notify us about a new site config which handle a paywall, you can create a new issue.

@wallabag wallabag locked and limited conversation to collaborators Nov 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests