Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's possible to avoid "403 Client Error: Forbidden for url" with url function? #765

Open
Jorman opened this issue Aug 20, 2023 · 2 comments

Comments

@Jorman
Copy link

Jorman commented Aug 20, 2023

Hi, I'm trying to configure a very simple url, is a simple site where you find used items, no log-in required, and nothing special is needed. If I try a simple wget with the url, it downloads the page, but if I use the url in the urlwatch configuration it returns error
403 Client Error: Forbidden for url
The only way I found to not get error is to configure urlwatch with "navigate" instead of "url", but of course it is much slower. Is there any way to understand why "url" mode doesn't work with this site?

If you want to try this is my configuration:

name: "Test"
navigate: "https://www.subito.it/annunci-italia/vendita/auto/suzuki/?q=suzuki+vitara"
filter:
  - css:
      selector: 'div.ItemListContainer_container__SjEc1 > p'
diff_filter:
  - grep: '^[@+]'

Any ideas?

@thp
Copy link
Owner

thp commented Aug 22, 2023

It depends on what the server does, e.g. maybe it checks user-agent or some other headers (it's probably not your IP address if wget works). You can override the user-agent header.

@trevorshannon
Copy link
Contributor

Sometimes you can also check the Google cache instead of the url directly. You can't do this very often (seems like a few times per day is ok) and it will not always be as immediately up-to-date as the direct url, but can help.

https://webcache.googleusercontent.com/search?q=cache:https://www.subito.it/annunci-italia/vendita/auto/suzuki/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants