Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reddit 403 Forbidden #3802

Open
adegans opened this issue Nov 14, 2023 · 18 comments
Open

Reddit 403 Forbidden #3802

adegans opened this issue Nov 14, 2023 · 18 comments

Comments

@adegans
Copy link

adegans commented Nov 14, 2023

On and off my 5 reddit feeds fail in refresh. I've seen this 3 or 4 times now this week.
It has worked fine for the past few months. But now, not all the time. Not sure what causes this. Nothing has changed (that I know of) on my end.

Some quota limit? Or fuckery from Reddit with API?
Does anyone else have this? Thoughts?

For example;

HttpException: https://www.reddit.com/search.json?q=%20author%3ARevolutionaryYam85&sort=new&include_over_18=on resulted in 403 Forbidden in lib/contents.php line 106

index.php(11): RssBridge->main()
lib/RssBridge.php(113): DisplayAction->execute()
actions/DisplayAction.php(71): DisplayAction->createResponse()
actions/DisplayAction.php(106): RedditBridge->collectData()
bridges/RedditBridge.php(83): RedditBridge->collectDataInternal()
bridges/RedditBridge.php(142): getContents()
lib/contents.php(106)

Query string: action=display&context=user&u=RevolutionaryYam85&bridge=RedditBridge&format=Atom&d=new
Version: 2023-09-24
Os: Linux
PHP version: 8.1.25
@dvikan
Copy link
Contributor

dvikan commented Dec 13, 2023

im seeing the same on rss-bridge.org. i think my server ip has been blocked see also threads on /r/rss

@adegans
Copy link
Author

adegans commented Dec 13, 2023

I did change the check interval to be every 12 hours, instead of the default 2 hours other feeds do.
But since yesterday all feeds failed again.

Maybe we need a setting on RSS Bridge/FreshRSS to set the user-agent and some header settings so we can pretend to be browsers better. That way Reddit can't snuff out automated systems as easily for a blacklist.

@virtadpt
Copy link

Reddit's native /.rss feeds were messed up yesterday, too. It's not rss-bridge.

@adegans
Copy link
Author

adegans commented Dec 14, 2023

@virtadpt But I reported this a month ago... So whatever happened in the last few days, probably not as relevant ;)
Before setting up the reddit bridge I found that Reddit killed off its rss feeds years ago, otherwise we wouldn't need the reddit bridge, right? Or did they re-add them?

Anyway, the link the bridge uses works fine in a browser, so the more logical thing is that Reddit has a quota for requests like this. Or has a way to profile these requests and block them. 403 errors are "NOT AUTHORIZED" errors after-all.

To work around that the user agent can be randomized or more 'browser like' headers can be used.
Or even the load time/interval could be randomized, so it's not exactly every 2-6-12 (or whatever) hours.

But, rss bridge doesn't do that unfortunately.

@virtadpt
Copy link

@adegans I didn't realize that - my bad. :)

I didn't know Reddit got rid of them anywhere on their system. I've got a bunch of bots pulling RSS feeds for subreddits that've been running for the last couple of years, the only hiccoughs being the odd 5XX error. Check this out:

https://www.reddit.com/r/Cyberpunk/.rss

Pick a subreddit, put a /.rss at the end of the URL.

TBH I don't know why there is support for Reddit in RSS-bridge for that reason, unless there's a use case for proxying the existing ones that I'm not aware of.

The way things are going, now that I posted it somewhere maybe they really will kill off RSS. Time will tell.

Anyway, I agree with you that a randomized user-agent header would be a good thing. I tend to think it's a useful feature to have in general.

@adegans
Copy link
Author

adegans commented Dec 15, 2023

After your previous reply I did tinker with the real reddit rss feeds a bit - WHy use a bridge if there is real RSS, right?
But it appears my server (FreshRSS) is blocked completely for now.

Also when loading the feeds locally in my rss reader (Netnewswire) it worked fine, but the formatting was all wonky with their RSS. Images not embedding, small thumbnails for some posts, larger ones for others. Weird borders around text...
So the reddit bridge, for me at least, has the advantage of producing a nicer looking feed.

I also played with a new user agent for rss bridge, but that made no difference - Probably because my server is blocked by IP or something.

sigh and such...

@virtadpt
Copy link

Part of me just stumbled across this - it seems Reddit is being weird about RSS. Which suggests that it might be going away soon.

https://www.reddit.com/r/bugs/comments/18gv6yh/newsblur_not_getting_reddit_rss_feeds/

@virtadpt
Copy link

virtadpt commented Dec 16, 2023

sigh Just like lighting a cigarette to make the bus arrive.

@Tone866
Copy link
Contributor

Tone866 commented Dec 16, 2023

Looks like replacing www.reddit.com with old.reddit.com works for now:
https://www.reddit.com/r/bugs/comments/18gv6yh/comment/kdkg3dn/?utm_source=share&utm_medium=web2x&context=3

@dvikan dvikan changed the title Reddit intermittent error 403 Reddit 403 Forbidden Dec 19, 2023
@dvikan dvikan pinned this issue Dec 19, 2023
@dvikan
Copy link
Contributor

dvikan commented Dec 19, 2023

Can confirm using old.reddit.com works right now. Famous last words.

Fixed in #3848

@dvikan
Copy link
Contributor

dvikan commented Dec 30, 2023

still working

@Rjvs
Copy link

Rjvs commented Mar 31, 2024

Stopped working again, by the looks of it. Just tested trying to create a feed using bridge01 and several other hosts and got 403 on all of them. Seems to have started on Mar 28th using https://rssbridge.bus-hit.me/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=Json

@dvikan
Copy link
Contributor

dvikan commented Mar 31, 2024

are you confortable giving the url you are getting 403 for?

@Rjvs
Copy link

Rjvs commented Mar 31, 2024

Sorry, was editing comment to add details while you were asking; https://rssbridge.bus-hit.me/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=Json is the original feed that broke for me. I since tried creating replacement feeds on several instances, so I suspect it’s intentional.

@dvikan
Copy link
Contributor

dvikan commented Mar 31, 2024

that url works fine in my dev pc (using 127.0.0.1)

but fails on https://rss-bridge.org/bridge01/?action=display&bridge=RedditBridge&context=single&r=LocalLlama&f=&score=&d=hot&search=&frontend=https%3A%2F%2Fold.reddit.com&format=html

the RedditBridge is programmed so that if reddit responds with 403 Forbidden, then rss-bridge caches that response for 60 minutes.

might seem excessive but it's an attempt to not get ip banned.

@Rjvs
Copy link

Rjvs commented Mar 31, 2024

Thanks for looking into it straight away! I had tried it on several public instances but had not been exhaustive. Have now tested most and have found one that works, so confirm your result. However, the majority of the public instances are returning 403 for me, so might need even tighter rate limiting.

@dvikan
Copy link
Contributor

dvikan commented Mar 31, 2024

curl 'https://old.reddit.com/search.json?q=subreddit%3Aphp&sort=hot&include_over_18=on'
<!doctype html>
     <html>
  <head>
    <title>Blocked</title>
    <style>
      body {
          font: small verdana, arial, helvetica, sans-serif;
          width: 600px;
          margin: 0 auto;
      }

      h1 {
          height: 40px;
          background: transparent url(//www.redditstatic.com/reddit.com.header.png) no-repeat scroll top right;
      }
    </style>
  </head>
  <body>
    <h1>whoa there, pardner!</h1>

<p>Your request has been blocked due to a network policy.</p>

<p>Try logging in or creating an account <a href=https://www.reddit.com/login/>here</a> to get back to browsing.</p>

<p>If you're running a script or application, please register or sign in with your developer credentials <a href=https://www.reddit.com/wiki/api/>here</a>. Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again. if you're supplying an alternate User-Agent string,
try changing back to default as that can sometimes result in a block.</p>

<p>You can read Reddit's Terms of Service <a href=https://www.reddit.com/wiki/api/>here</a>.</p>

<p>if you think that we've incorrectly blocked you or you would like to discuss
easier ways to get the data you want, please file a ticket <a href=https://support.reddithelp.com/hc/en-us/requests/new?ticket_form_id=21879292693140>here</a>.</p>

<p>when contacting us, please include your ip address which is: <strong>68.183.7.72</strong> and reddit account</p>
  </body>
</html>

@corenting
Copy link
Contributor

It's possible to bypass the new limits by pretending to be the Android client: you have to login with the Android oauth client ID and add some headers. See https://github.com/redlib-org/redlib. You can then query the JSON endpoints on the oauth.reddit.com domain.
I tried it for a project of mine and seems to works well for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants