Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving toward our next release #836

Open
6 of 7 tasks
sigaloid opened this issue Jul 13, 2023 · 43 comments
Open
6 of 7 tasks

Moving toward our next release #836

sigaloid opened this issue Jul 13, 2023 · 43 comments

Comments

@sigaloid
Copy link
Member

sigaloid commented Jul 13, 2023

It is done. https://github.com/redlib-org/redlib

Hello all,

Regretfully I haven’t had the free time to keep up with the recent changes a few days ago. reddit’s rate limiting is successfully implemented. I want to get a new release out ASAP (this weekend hopefully) that will do a few things. I would be grateful if someone would be able to help with any of these changes.

  • Modify my draft PR- use it as a base since it has some needed changes but tear out all of the oauth spoofing. Currently it auto generates an API token via spoofing the mobile app. We cannot distribute code that does this. Change it to instead have an optional config option to have a token. DO NOT return it in the info+config page! The change you should keep is the change of the base URL to oauth (if the config option is set) and the attaching of any relevant headers (not fake mobile ones). the whole oauth module should be thrown out.

  • See exactly how the rate limits work. I know they said 10/min for “anonymous” “Oauth” requests (I take this to mean oauth.reddit.com requests without ANY auth). Does this work correctly? Is it better or worse than anonymous? Basically, does oauth.r have any different limits than www.r for the same endpoints (.json ones - not full oauth specced ones)?

  • Also, they said 100/min for “authenticated free oauth requests” meaning to oauth.r but with a free token - can these be generated live? This is exactly what reveddit did in client side to check the “live” version of the app. Basically these tokens were free to generate with a single http request and had no validation. Is this still possible, or is there required to be an account attached now? To do this, take a look at my draft PR, look at the oauth.rs initialization routine and see what request it makes. Translate it to curl and make sure it still works (it should ;) ). (Alternatively in the reversing reddit api issue there was python code someone posted that did similar- see if it still works). Then strip out the mobile spoofed headers - does it still return a token? If so, we can plausibly use it to generate a token at launch. 100r/min might be more breathing room. If possible/safe to do, we could generate them dynamically based on demand and distribute requests across a few tokens.

  • We neeeeeeed to rename. I can’t emphasize this enough. Doubly so, since if we make these changes, we’re implicitly agreeing to the API’s terms, which describe exactly what words we can use and can’t use to define our project. We need to replace all of the instances of “reddit” in the codebase, templates, etc.

  • Also, what is going to happen when we agree to the API terms? This is going to have ramifications since before we were just web scraping but now the implicit API terms will change that.

  • Merge the recent PR about the de-anonymizing headers. Critical priority - only reason I haven’t merged is that we need to get these other changes out with the next release - there’s no point in pushing another release when the behavior is borked the way it is for any popular instance.

All of these are top priority for the project and I want to get a release out soon.

To be clear I have been asked by operators whether this would require them creating a token (possibly linked to an account) to run an instance and that they likely wouldn’t run one if they had to. My goal is to avoid this if at all possible.

If your traffic is below the rate limits right now you already can. It may be possible that even if it’s higher, you can either 1. have the application generate one of these “anonymous tokens” on the fly without account cookies, logging in, or anything, or 2. specifying one of your own (which can come from anywhere - it doesn’t have to be one you make). I know several apps are given Reddit’s blessing to escape rate limits and someone will surely extract their token. We can’t redistribute it but we also can’t stop you from using it.

Sorry for the megapost. I’m going to try to make some progress on these in the coming days (me posting this is also lighting a fire under me to do exactly that) but if anyone’s looking to contribute it would be a huge help.

Truth be told I’m losing some hope for this open source front end genre of software, with all of the changes happening. The twitter API change threw an ugly wrench into a lot of my work and research, and this throws a medium ugly wrench into Libreddit, but I trust that it’ll work out fine (and I know this kind of thing is needed even more because of these changes).

Anyway, peace, and if you’re interested in picking up any of these (I think 2, 3, or 4 would be the most concrete) and have questions, let me know. I’ll keep my progress updated here.

PS: If you want to help but don’t want to do any of these tasks, link this issue in the bug reports that talk about rate limiting. We’re working on it folks! Also instance operators will have to upgrade once we stamp out a new release so there’ll be more waiting.

  • PPS: Another easy task is that there’s a couple of outstanding requests to add instances in our instances repo. Turn them into PRs!!
@AyoungDukie
Copy link

For name at least, it may be worth considering the former fork's name ferritreader/ferrit

@0xEsky
Copy link

0xEsky commented Jul 14, 2023

Than you so much for this. Seems a very sound and reasonable approach.

@CONIGUERO
Copy link

CONIGUERO commented Jul 16, 2023

@sigaloid why not just spoof the mobile API? no need to agree to any terms. https://news.ycombinator.com/item?id=36086240

There's work being done under #818 too!

EDIT: Just saw item #1 on the checklist. Why can't you distribute the code that does that? It's nothing illegal and, as you said earlier, the project has not agreed to any API terms.

@sigaloid
Copy link
Member Author

Yes, I was part of that effort to reverse engineer, but we can’t distribute code that emulates or spoofs an official client. It opens us up to legal concerns.

@CONIGUERO
Copy link

CONIGUERO commented Jul 16, 2023

Yes, I was part of that effort to reverse engineer, but we can’t distribute code that emulates or spoofs an official client. It opens us up to legal concerns.

It does not. It's what clients for scrape-hostile sites like YouTube do. NewPipe, for example, impersonates an Android TV and web client to get the YT data and video streams. It's also what Nitter does (using the twitter web client API keys), which is a project very similar in both nature and spirit to this one.

@htsmi
Copy link

htsmi commented Jul 16, 2023

It may be time to consider spinning off of libreddit as a new project. It sounds like it needs to be renamed anyhow and will have to take a considerably different approach. Whatever that is will have to contend with a lot more hostility from the Reddit side, some people will be willing to go further than others in this regard. It would likely be good to have some separation from the current project, and those who have contributed to it, because they may not want to be associated with the approaches that are now required.

@sigaloid
Copy link
Member Author

It does not.

Just because others do it, doesn’t mean it’s not grounds for a takedown or whatever legal action is necessary. What many seem to misunderstand about this kind of law is that even if these actions are okay to perform, that won’t stop an army of lawyers from making whatever threats necessary. This isn’t some unknown thing, this has happened to maintainers of similar software who have zero standing besides “you’re breaking TOS”, yet the legal threats were enough to force his hand into giving up on it. Youtube-dl, invidious, barinsta.. all who have received these threatening letters. I’m not willing to place my full legal name onto what very well may be an infringement. In another universe where my contributions weren’t in my name and this wasn’t the stance of the whole Libreddit team, things would be different. The biggest defense against a notice, is a lack of destination address.

I hope you don’t take this as me not supporting that kind of reverse engineering change (I literally wrote half the code for it). My opposition is primarily because it’s the stance of the whole team, and secondly, that I believe it will not be necessary. It’s not that I don’t want to be associated with these issues due to a moral obligation.

If it indeed becomes necessary, and we can’t get by with the changes outlined, Libreddit (the current team and repo) will shut down anyway, in which case I would be a supportive user of a fork that keeps it alive. I only ask that it wait until I have the time to confirm it. I’m juggling a lot of things right now but I intend to get to this worked on soon.

Why I don’t believe it will be necessary to fork: At the worst case, we will add a token setting, someone will pluck out the token off of another app and share it. This change on our end can be done in a lunch break if I cannot find another avenue.

@CONIGUERO
Copy link

CONIGUERO commented Jul 16, 2023

It does not.

Just because others do it, doesn’t mean it’s not grounds for a takedown or whatever legal action is necessary. What many seem to misunderstand about this kind of law is that even if these actions are okay to perform, that won’t stop an army of lawyers from making whatever threats necessary. This isn’t some unknown thing, this has happened to maintainers of similar software who have zero standing besides “you’re breaking TOS”, yet the legal threats were enough to force his hand into giving up on it. Youtube-dl, invidious, barinsta.. all who have received these threatening letters. I’m not willing to place my full legal name onto what very well may be an infringement. In another universe where my contributions weren’t in my name and this wasn’t the stance of the whole Libreddit team, things would be different. The biggest defense against a notice, is a lack of destination address.

I hope you don’t take this as me not supporting that kind of reverse engineering change (I literally wrote half the code for it). My opposition is primarily because it’s the stance of the whole team, and secondly, that I believe it will not be necessary. It’s not that I don’t want to be associated with these issues due to a moral obligation.

If it indeed becomes necessary, and we can’t get by with the changes outlined, Libreddit (the current team and repo) will shut down anyway, in which case I would be a supportive user of a fork that keeps it alive. I only ask that it wait until I have the time to confirm it. I’m juggling a lot of things right now but I intend to get to this worked on soon.

Why I don’t believe it will be necessary to fork: At the worst case, we will add a token setting, someone will pluck out the token off of another app and share it. This change on our end can be done in a lunch break if I cannot find another avenue.

I appreciate the detailed response and respect the choice. However please don't implicate other projects that didn't do that. Except for Barinsta, all tbe other examples you gave refused to shut down even with a legal action threat, Invidious doing so very recently: iv-org/invidious#3872 (comment)

@htsmi
Copy link

htsmi commented Jul 16, 2023

Thanks for the clarification on your position and situation, that helps. I am encouraged that you still see a route forward on this. Thank you for all your hard work!

@libreddit libreddit deleted a comment Jul 21, 2023
@avincent98144
Copy link

avincent98144 commented Jul 21, 2023 via email

@sigaloid
Copy link
Member Author

Just to give an update, the team is still deliberating on how we’re going to move forward. I believe we can do so while keeping Libreddit easy to use and functional even under high traffic. I know the waiting period is frustrating!

@avincent98144
Copy link

avincent98144 commented Jul 22, 2023 via email

@ghost
Copy link

ghost commented Jul 23, 2023

Most Libreddit do work, though occasionally unless/until their 'Too Many Requests' appears.
A workaround is to use a user-script which enables to hop quickly (& randomly) from one Libreddit instance to another.
I personally use the Reddit to Libreddit Redirect script.
For those (as myself) who already have a Reddit to Libreddit automatic redirecting (as with the Redirector browser extension) just remove in the above-mentioned script:

// @match        *://www.reddit.com/*
// @match        *://old.reddit.com/*

Also, of course, add/remove Libreddit instances included in the script.
This is a workaround but at least allows to use Libreddit instances without going through the hassle of finding one free of 'Too many Requests'.
The script preserves the Libreddit page url parameter when changing instance so it's just a matter of clicking the script's button to jump to another instance.
Works fine here.

I've been reading/following you guys' comments. Of course I hope a fix will be found: thanks to those who work on it. Unfortunately I lack skills to help in that area.

@avincent98144
Copy link

avincent98144 commented Jul 23, 2023 via email

@seychelles111
Copy link

refer to my idea of using random ipv6 per user
#845

@sigaloid

@seychelles111
Copy link

refer to my idea of using random ipv6 per user #845

@sigaloid

@avincent98144
@Cade66

@Clozent
Copy link

Clozent commented Jul 27, 2023

I'm sorry if my understanding is lacking, but why not just scrape reddit for the content?

@sigaloid
Copy link
Member Author

Scraping reddit would incur a lot more traffic, and require more code to handle parsing the HTML. Also, their HTML endpoints (the webpages) are more heavily rate limited than the JSON ones, IIRC.

@obj-obj
Copy link

obj-obj commented Jul 30, 2023

How about using the API that the desktop website uses? Or is that also rate limited?

@AdiPathak97
Copy link

AdiPathak97 commented Jul 31, 2023

https://farside.link/libreddit automatically redirects to a working instance. (Credit: Farside). Easiest workaround I have found so far.

Can use this userscript to automatically redirect instance in case you do run into rate-limiting while viewing a particular post and want to pick up where you left off.

@RUGMJ
Copy link

RUGMJ commented Jul 31, 2023

sadly that's also rate limited

@Infinitytreacher
Copy link

Scraping reddit would incur a lot more traffic, and require more code to handle parsing the HTML. Also, their HTML endpoints (the webpages) are more heavily rate limited than the JSON ones, IIRC.

Isn't scraping what yt-dlp does also? Is scraping absolutely out of proportion, or can it be atleast the last resort?

@SpomJ
Copy link

SpomJ commented Aug 9, 2023

Also, their HTML endpoints (the webpages) are more heavily rate limited than the JSON ones, IIRC.

wdym more heavily rate limited? From what i understand the only limit is the page load speed...

@obj-obj
Copy link

obj-obj commented Aug 13, 2023

Another possible solution is using the API, and falling back to scraping if the API rate limit is reached.
Even if the scraping is also rate limited, the total limit would be higher. Maybe this could be taken further, and use the official API, the desktop site API, and scraping (in that order)?

@obj-obj
Copy link

obj-obj commented Aug 13, 2023

There's also https://reddit.com/r/all.json and https://www.reddit.com/r/facepalm/comments/15pbo3k/irresponsible_parenting.json, but I don't know if those are just aliases to the regular Reddit API

@zipfile6209
Copy link

Sorry if I say something nonsense, but maybe rss would be another good source along with scraping to increase the limit? Even though some of the info would be lost, like votes.

@obj-obj
Copy link

obj-obj commented Aug 14, 2023

Sorry if I say something nonsense, but maybe rss would be another good source along with scraping to increase the limit? Even though some of the info would be lost, like votes.

The problem is that RSS only includes a couple posts, and there's no pagination iirc

@PaperOrb
Copy link

Forgive my ignorance, but why not just bypass the API and create a tool that scrapes reddit using proxy servers, and that then presents it using the libreddit frontend portion?

@seychelles111
Copy link

There's also https://reddit.com/r/all.json and https://www.reddit.com/r/facepalm/comments/15pbo3k/irresponsible_parenting.json, but I don't know if those are just aliases to the regular Reddit API

you're confused, it wont change the fact that its rate limited .

@obj-obj
Copy link

obj-obj commented Aug 27, 2023

There's also https://reddit.com/r/all.json and https://www.reddit.com/r/facepalm/comments/15pbo3k/irresponsible_parenting.json, but I don't know if those are just aliases to the regular Reddit API

you're confused, it wont change the fact that its rate limited .

Yes, but assuming the rate limit is individual for all those different resources and not system-wide, you'd be able to send 4x as many requests without being rate limited

@avidseeker
Copy link

Geddit is an Android app without depending on Reddit API but on RSS/JSON feed.

@r7l
Copy link

r7l commented Oct 27, 2023

Running personal instance of libreddit just works fine as ever. I don't use public instances but is the limiting actually that bad? I also use Teddit, which seems to work without issues as well.

@sigaloid
Copy link
Member Author

sigaloid commented Nov 1, 2023

Many misunderstandings going on here - any calls to JSON, API, desktop, site, or w/e will all go towards the rate limit.

There are ways around it - and r7l is correct, personal will still work (or just for you and your friends). Also many of the instances are already below the limit and you can use them as normal (might need to bounce around to find one).

Unfortunately I have become incredibly overloaded with everything going on in my life. But if anyone here runs a public and popular instance listed on the instance list, and wants to get it working, ping me on element: @sigaloid:matrix.org

@SpomJ
Copy link

SpomJ commented Dec 4, 2023

Many misunderstandings going on here - any calls to JSON, API, desktop, site, or w/e will all go towards the rate limit.

There are ways around it - and r7l is correct, personal will still work (or just for you and your friends). Also many of the instances are already below the limit and you can use them as normal (might need to bounce around to find one).

Unfortunately I have become incredibly overloaded with everything going on in my life. But if anyone here runs a public and popular instance listed on the instance list, and wants to get it working, ping me on element: @sigaloid:matrix.org

So theoretically the code responsible for requests could just be executed client-side and everything would work just fine?

@sigaloid
Copy link
Member Author

sigaloid commented Dec 4, 2023

Yes, but this gets rid of much of the privacy protections desired in the project

@SpomJ
Copy link

SpomJ commented Dec 4, 2023

Yes, but this gets rid of much of the privacy protections desired in the project

Why did all the 3rd-party apps shut down in the first place then? I'm sure they made calls client-side...

Also i think a reshaped project is better than a dead project, so if there's really no answer i think privacy protections could be made optional...

The only solution i can think of if all the calls are limited would be to implement some sort of global routing thingy, so that the list of servers try to evenly distribute requests between themselves.

@sigaloid
Copy link
Member Author

sigaloid commented Dec 4, 2023

This project specifically does it server side so many users can use it anonymously. The other third party apps shut down because they had to - the API they used was made paid.

The reason why we still work is because we use the read only API - and even then we're subject to usage limits

@Trit34
Copy link

Trit34 commented Dec 5, 2023

i think privacy protections could be made optional...

Exercising a fundamental right, optional? 🤨

@obj-obj
Copy link

obj-obj commented Dec 5, 2023

Many misunderstandings going on here - any calls to JSON, API, desktop, site, or w/e will all go towards the rate limit.
There are ways around it - and r7l is correct, personal will still work (or just for you and your friends). Also many of the instances are already below the limit and you can use them as normal (might need to bounce around to find one).
Unfortunately I have become incredibly overloaded with everything going on in my life. But if anyone here runs a public and popular instance listed on the instance list, and wants to get it working, ping me on element: @sigaloid:matrix.org

So theoretically the code responsible for requests could just be executed client-side and everything would work just fine?

If you want to do that you can just run libreddit locally, and access it using localhost

@sigaloid
Copy link
Member Author

Hi all, just to update: I've forked it.

https://github.com/redlib-org/redlib

Should be as easy as replacing your docker pull commands with quay.io/redlib/redlib. Would love to hear your experience with it. It should fix all rate limiting issues and as such I would appreciate anyone who runs any public instances to try out the new fork. Thanks!

@bayazidbh
Copy link

@sigaloid Cheers. Still need cleanups, at least for the readme, but nice to see it. Any public instance yet btw?

@sigaloid
Copy link
Member Author

https://redlib.matthew.science/ is up and running, though by no means official (probably temporary).

Will work on an instance list repo soon

@Tokarak
Copy link
Contributor

Tokarak commented Jan 6, 2024

Many misunderstandings going on here - any calls to JSON, API, desktop, site, or w/e will all go towards the rate limit.
There are ways around it - and r7l is correct, personal will still work (or just for you and your friends). Also many of the instances are already below the limit and you can use them as normal (might need to bounce around to find one).
Unfortunately I have become incredibly overloaded with everything going on in my life. But if anyone here runs a public and popular instance listed on the instance list, and wants to get it working, ping me on element: @sigaloid:matrix.org

So theoretically the code responsible for requests could just be executed client-side and everything would work just fine?

I felt like this was the ideal solution (I don't have sympathy for the "privacy concerns"), but I couldn't find a library which allows this architecture. It's possible that browsers plainly would not allow cross-origin content loading like this.

As someone said, running a local instance of redlib works. I've been self-hosting like this for a year, maybe; resource usage is very low; never hit rate limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests