Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nitter over IPFS or torrent #1188

Open
devgaucho opened this issue Mar 1, 2024 · 19 comments
Open

nitter over IPFS or torrent #1188

devgaucho opened this issue Mar 1, 2024 · 19 comments

Comments

@devgaucho
Copy link

devgaucho commented Mar 1, 2024

with the current limitations of private instances it could be interesting to think about sharing data between instances

it doesn't make sense for each instance to make dozens of requests for the same profile at all times

files with the latest tweets could be created and shared via IPFS with a centralized index that would function as an official tracker

the name used in the index of these files could be <username>/<unix epoch>/tweets.json, something similar to web.archive.org

this official tracker could have official mirrors protected by hash system, like the secure apt mirrors on debian/ubuntu

so if there is a file in IPFS created in the last 60 seconds, the instance could get the data from IPFS instead of Twitter

doing it this way:

  1. nitter becomes webscrapping friendly
  2. is it possible to use cloudflare to share json without captcha via ipfs
  3. nitter remains decentralized
  4. the load is much lighter for the instances as images can be shared between multiple instances

the main problem with IPFS is the lack of seeds, this does not seem to be a problem when we are dealing with an already consolidated network of decentralized instances

to increase the number of seeds, torrents could be used so that any visitor with a bit torrent client can download the latest tweets and contribute to maintaining the network with seeds (archive.org has been doing this for years)

this would certainly help a lot to reduce the number of requests made by instances to Twitter

@devgaucho devgaucho changed the title IPFS nitter over IPFS or torrent Mar 1, 2024
@ArchivingToolsForWBM
Copy link

ArchivingToolsForWBM commented Mar 1, 2024

That is very clever, hopefully the "Instance has been rate limited." issue goes away once this is implemented. It's been over a month of unable to save posts through nitter. Even the instances that 403s bots kept getting the aforementioned error message, after a 2-3 pages visits by a human being.

Looking at it, it looks like a cache system shared across nitter instances.

@devgaucho
Copy link
Author

it's exactly a shared cache system, like a wayback machine just for tweets running on IPFS

yes, displaying outdated tweets is much better than displaying no tweets at all

If instead of just reading the RSS feeds, a torrent file of the latest tweets were created, this could certainly alleviate the load on the RSS system, which demands a lot from the instances, speeding up viewing through seeds/peers

Few people tend to seed IPFS, but torrenting is easy and makes it easier to create new instances. The instances could actually be divided into 4 groups:

  1. IPFS and torrent file indexes
  2. index mirrors (only replicate the indexes)
  3. miners (download tweets and share with indexes)
  4. IPFS/torrent clients (just display and seed files with messages)

Using IPFS and torrent, it would no longer be necessary for all instances to have valid Twitter accounts to display messages.

The more instances and users seed messages, the faster and stronger the nitter will become 💪

@ArchivingToolsForWBM
Copy link

That means when the user/WBM loads a nitter page, this happens:

  1. Nitter instance searches other instances/peers to load up the data (instead of repeatedly sending request to twitter too rapidly)
  2. Tweet/profile is found on another instance, that instance passes the data over to the server requested by the user/WBM
  3. user's browser/WBM's "browser" will receive that data.

Nothing negative has changed on the client side, just that the way it works no longer re-request and trigger its god-awful rate limit, and is faster.

Hopefully in the future, we also have such an archivable frontend system for bsky, notion and misskey, the WBM's playback (viewing an archived page) isn't capable of saving complex-JS pages and resulted in having blank pages.

@BANKA2017
Copy link

Using IPFS/torrent is a good idea, but:

  • How should we verify that the timeline is without tampering?
  • Is it possible to have a "tampered timeline" attack if verification is not performed?
  • If we want to verify that the timeline is correct, does it mean setting up a "trusted" instance? Such an instance would require sending a large number of requests to Twitter

The current dilemma of nitter is that instances cannot handle a large number of requests.

@ofifoto
Copy link

ofifoto commented Mar 2, 2024

some sort of eventual consensus of x nitter instances maybe?

@devgaucho
Copy link
Author

How should we verify that the timeline is without tampering?

through certified instances

ex:

  1. the user accesses a profile on nitter.example.com
  2. nitter.example.com queries the IPFS index server (nitter.net?) to see if there are recent tweets on that profile
  3. if they don't exist, nitter.example.com downloads the latest tweets and saves them as a JSON file
  4. It then shares this JSON file via the IPFS hash with an IPFS index server
  5. this index server makes the hash of these tweets available with the other instances

If the index server detects that one of the instances has shared corrupted tweets, its IPFS hashes are hidden

Is it possible to have a "tampered timeline" attack if verification is not performed?

yes, but in this case the instance will be marked as untrusted by the IPFS index server

If we want to verify that the timeline is correct, does it mean setting up a "trusted" instance? Such an instance would require sending a large number of requests to Twitter

just check the tweets individually (like on fxtwitter, vxtwitter, etc.) if any suspicion of tampering arises

@devgaucho
Copy link
Author

some sort of eventual consensus of x nitter instances maybe?

yes! the index servers would act as a DNS server network for tweets by converting queries like <username>/<unix epoch>/tweets.json to the IPFS hash that contains the JSON with the tweet's metadata

@ln2max
Copy link

ln2max commented Mar 3, 2024

I like this. This would effectively pool each Nitter instance's accounts and request budgets, without disadvantaging any instance operator or requiring them to sacrifice request budget to help others.

@I-I-IT
Copy link

I-I-IT commented Mar 3, 2024

Yes it's a good idea. The tampering issue isn't really a problem, plus it's not the biggest preoccupation right now.

@ArchivingToolsForWBM
Copy link

ArchivingToolsForWBM commented Mar 3, 2024

@libreddit, @iv-org, @Booteille Alternative frontend services in general should be doing this. I'd expect platforms will go even harder on the rate limits, either by further reducing the number of requests per time period, increase the length of the cooldown (most rate limits, like github, are 1-4 minutes, twitter is a whopping full day), placing more stuff behind a loginwall and/or any form of hindering users from, well, use the site in any way (not just scrapers who's affected, but also ordinary users viewing content). If AFEs (alternative front ends) don't, they'll become useless.

As long as enshittification continues on free sites, AFE will increasingly become important.

The act of making sites easier to use from designed problems/annoyance have started with browser extensions, since the internet is an open web, and that browsers are a software agent on behalf of the user rather than the website's. Site owners wanted their websites to be like television that they are the ones to decide how you experience their websites, especially when it comes to intrusive advertisements and disabling right-clicks/select text/inspect elements. And now in the modern era where the internet is mostly used by mobile devices, a lot of our abilities to use the web are hindered. Seriously, we shouldn't need or to be pestered or nudged into having another browser (but with even fewer features) just to see text and images.

I'm very thankful of whoever have invented alternative frontend sites, as well as the @ipfs decentralized nature. This restores our ability to browse the net annoyance-free that espically the mobile web have taken it away from.

@xaur
Copy link

xaur commented Mar 3, 2024

Great question. Welcome to the world of decentralized consensus. This is a hard problem to solve. But if solved, the potential gains are huge: Nitter working again with pretty much no rate limiting, while making low traffic to Twitter (and not getting banned).

The tampering problem has always existed for Nitter and similar front-ends. You never know if the Nitter/Invidious/Libreddit/etc server is showing you unmodified data from the source, or fake/tampered data. We are just lucky to have so many honest and altruistic front-end operators who provide us with a great service for free.

The tampering issue isn't really a problem, plus it's not the biggest preoccupation right now.

It is really a good question. Creating a decentralized anti-tampering system is a lot of R&D work, and we should be asking what is the reasonable amount of time/energy to be invested in it (compared to other efforts to keep Nitter runnning in some capacity).

I don't see a good solution yet, but one thing is immediately clear. If Nitter instances start sharing data (to solve the problem of making too many requests to Twitter), a malicious instance will be able to do more damage, compared to a single malicious instance in the current architecture.

In the current architecture, if one instance starts serving fake data, people can just stop using it, switch to an honest instance, and notify each other to avoid that bad instance. The damage is limited to that one instance, and "poisoning" the data on other well-known and trusted instances requires to hack them. In a distributed/decentralized architecture, one instance could poison the whole network if there is no good anti-tampering system in place. Therefore, it is more important than in the current architecture of independent individual instances.

@AlexGuo1998
Copy link

The authenticity problem has been discussed in #919 (comment) (Starting from this comment) and there is also an individual issue for it (#931).

Quoting from @12joan (#919 (comment)):

I took a look at that link, and it looks like they actually managed to make this work. https://github.com/tlsnotary/PageSigner

It seems possible to cryptographically verify that some data is originated from Twitter, untampered. But someone has to research the inner detail of TLS for it.

@devgaucho
Copy link
Author

Yes, it is important to have network IPFS file indexes that function as anti-tamper systems.

To illustrate, I used as an example the DNS system that translates website names into IPs (addresses of servers on the internet)

Without checking the origin, a DNS server is extremely dangerous, which is why so many people use the same services (google, cloudflare, opendns, etc.)

Trust in the servers is essential in both cases, in DNS for example there are 13 root servers that feed thousands of other servers

https://upload.wikimedia.org/wikipedia/commons/thumb/e/ee/Root-current.svg/1405px-Root-current.svg.png

currently the listing of public instances of nitter already works in a more or less structured and centralized way instance indexes already exist and their quality is assessed according to availability, resources and response time

simply add a system for evaluating the integrity of tweets and this list becomes a decentralized anti-tamper system

This system can be automatic (through sampling verification scripts) or collaborative (public API) as it is not possible to do it manually

Only instances linked to this integrity assessment network would be shown in the index, instances with tampered tweets could simply be left out.

Thus the quality of a considerable group of instances can be guaranteed

@ArchivingToolsForWBM
Copy link

ArchivingToolsForWBM commented Mar 4, 2024

@devgaucho Good comparison with indexing -> DNS. Both maps data to a valid, untampered information. We don't want a nitter page containing tweets "by users" saying things they actually didn't say on twitter. Worse as a archivist, saving tweets that are false, which could perpetuate the false information.

@xaur
Copy link

xaur commented Mar 8, 2024

simply add a system for evaluating the integrity of tweets and this list becomes a decentralized anti-tamper system

The problem is that the list is centralized and could go down or get poisoned.

But yeah if we had a decentralized system for data integrity verification that would be huge. Ideally tweets could be verified at end user's client.

I took a look at that link, and it looks like they actually managed to make this work. https://github.com/tlsnotary/PageSigner

It says it was deprecated in 2023 due to a vulnerability that allowed to create fake proofs.

But other projects in that organization are still active and the website looks promising https://tlsnotary.org/

@devgaucho
Copy link
Author

devgaucho commented Mar 8, 2024

@xaur we already have a decentralized system for verifying data integrity. the challenge is not to verify the integrity of individual messages but to index the message lists on users' pages only once and share the tweet ids with all other instances throught IPFS

@ln2max
Copy link

ln2max commented Mar 12, 2024

Verification of tweets should probably happen on a different layer. All we need to do is for each individual instance to mark/sign its contributions to the collective database with an instance-specific identifier. Then instances which read the database can analyze contributions for correctness and blacklist as necessary.

There's not really any case where we want contributions from an instance that sometimes manipulates tweets. The current system of "if an instance supplies fake data, use another instance" is good enough. We can replicate this model by allowing people to ignore tweets contributed to the DB by the one or the other instance.

For the case where someone spams the DB with fake tweets each pretending to come from different (nonexistent) instances, it's easy enough for the retrieving instance to have a whitelist of "real" instances whose signature or hash is verified the same way LetsEncrypt verifies you control a given web service: in order to claim that contribution X is originated by nitter.net using key 0xd3adb33f, the public key with fingerprint 0xdeadb33f must be retrievable at nitter.net/key.

Once the list of public keys has been retrieved, they can be cached at the consumer.

Rate-limiting the creation of spam instances is easy enough, since the consumer can see that nitter.net has contributed a lot of data over a long period of time, while nitter.national.shitposting.agency appeared yesterday and has only made 3 contributions. Ergo contributions from nitter.net are probably fine while contributions from nitter.national.shitposting.agency should be ignored until it has more track record.

@eownerdead
Copy link

OrbitDB may be suitable for this.

@ErikUden
Copy link

ErikUden commented Apr 7, 2024

Why not use ActivityPub? I think that protocol is perfect for Nitter's usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants