Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

looks like X/twitter(?) broke something again #983

Open
animegrafmays opened this issue Aug 15, 2023 · 666 comments
Open

looks like X/twitter(?) broke something again #983

animegrafmays opened this issue Aug 15, 2023 · 666 comments

Comments

@animegrafmays
Copy link


8493b396fd05f26fe681a6abe9a849dc983d091fd4472a53dc3aa72547b030c4

BANKA2017 added a commit to BANKA2017/twitter-monitor-assets that referenced this issue Aug 15, 2023
@ghost
Copy link

ghost commented Aug 15, 2023

Also, the syndication api for showReplies=true does not work anymore:
https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=true
but showReplies=false still works, showing the tweets ordered by like count...
https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=false

@beingnajib
Copy link

Yes, it is not working now. I hope the Nitter people fix this soon.

@paulamei
Copy link

paulamei commented Aug 15, 2023

Is there a online/CLI tool converting
https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk
to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

@nerra0pos
Copy link

Also, the syndication api for showReplies=true does not work anymore: https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=true but showReplies=false still works, showing the tweets ordered by like count... https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=false

Not really. showReplies=false shows years-old content when not logged in.

@iceFbr
Copy link

iceFbr commented Aug 15, 2023

Down again...

@Dheatly23
Copy link

Also, the syndication api for showReplies=true does not work anymore: https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=true but showReplies=false still works, showing the tweets ordered by like count... https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=false

Not really. showReplies=false shows years-old content when not logged in.

That's because in that specific example, those tweets were years ago. Look again at the like count, notice anything?

@ghost
Copy link

ghost commented Aug 15, 2023

Is there a online/CLI tool converting https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

We can just search for the first { from begin and first } from end and then parse as json.
If a user has not so many tweets (500-1000) then the chance is quite good that also newer tweets are within the most popular 100.
But of course it's not a very good solution. At least, nitter should have it as a backup when nothing else works, this method can be used.

@Write
Copy link

Write commented Aug 15, 2023

Is there a online/CLI tool converting https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

We can just search for the first { from begin and first } from end and then parse as json. If a user has not so many tweets (500-1000) then the chance is quite good that also newer tweets are within the most popular 100. But of course it's not a very good solution. At least, nitter should have it as a backup when nothing else works, this method can be used.

Indeed

#!/usr/bin/python3

import requests
import re
import urllib

url  = "https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk"

with urllib.request.urlopen(url) as response:
    encoding = response.info().get_param('charset', 'utf8')
    html = response.read().decode(encoding)
    result = re.search('script id="__NEXT_DATA__" type="application\/json">([^>]*)<\/script>', html)[1]

    print(result)

@paulamei
Copy link

paulamei commented Aug 15, 2023

Indeed

Interesting, but this doesn't return RSS with 'item', 'pubDate' etc. tags. Maybe a script using https://github.com/lkiesow/python-feedgen would do the job?

@Write
Copy link

Write commented Aug 15, 2023

Indeed

Interesting, but this doesn't return RSS with 'item', 'pubDate' etc. tags. Maybe a script using https://github.com/lkiesow/python-feedgen would do the job?

Not sure I understand ? It expose far more informations than needed and it does expose the date and all

Here's an example for one tweet only :

 {
            "type": "tweet",
            "entry_id": "tweet-1519480761749016577",
            "sort_index": "1691455400412446720",
            "content": {
              "tweet": {
                "id": 0,
                "location": "",
                "conversation_id_str": "1519480761749016577",
                "created_at": "Thu Apr 28 00:56:58 +0000 2022",
                "display_text_range": [
                  0,
                  52
                ],
                "entities": {
                  "user_mentions": [],
                  "urls": [],
                  "hashtags": [],
                  "symbols": [],
                  "media": []
                },
                "favorite_count": 4600599,
                "favorited": false,
                "full_text": "Next I’m buying Coca-Cola to put the cocaine back in",
                "id_str": "1519480761749016577",
                "lang": "en",
                "permalink": "/elonmusk/status/1519480761749016577",
                "possibly_sensitive": false,
                "quote_count": 171975,
                "reply_count": 187438,
                "retweet_count": 649833,
                "retweeted": false,
                "text": "Next I’m buying Coca-Cola to put the cocaine back in",
                "user": {
                  "blocking": false,
                  "created_at": "Tue Jun 02 20:12:29 +0000 2009",
                  "default_profile": false,
                  "default_profile_image": false,
                  "description": "Blades of Glory",
                  "entities": {
                    "description": {
                      "urls": []
                    },
                    "url": {}
                  },
                  "fast_followers_count": 0,
                  "favourites_count": 30569,
                  "follow_request_sent": false,
                  "followed_by": false,
                  "followers_count": 153112066,
                  "following": false,
                  "friends_count": 410,
                  "has_custom_timelines": false,
                  "highlightedLabel": {
                    "badge": {
                      "url": "https://pbs.twimg.com/profile_images/1683899100922511378/5lY42eHs_bigger.jpg"
                    },
                    "description": "X",
                    "userLabelType": "BusinessLabel",
                    "userLabelDisplayType": "Badge"
                  },
                  "id": 0,
                  "id_str": "44196397",
                  "is_translator": false,
                  "listed_count": 126597,
                  "location": "𝕏Ð",
                  "media_count": 1659,
                  "name": "Elon Musk",
                  "normal_followers_count": 153112066,
                  "notifications": false,
                  "profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1690621312",
                  "profile_image_url_https": "https://pbs.twimg.com/profile_images/1683325380441128960/yRsRRjGO_normal.jpg",
                  "protected": false,
                  "screen_name": "elonmusk",
                  "show_all_inline_media": false,
                  "statuses_count": 29441,
                  "time_zone": "",
                  "translator_type": "none",
                  "url": "",
                  "utc_offset": 0,
                  "verified": false,
                  "withheld_in_countries": [],
                  "withheld_scope": "",
                  "is_blue_verified": true
                }
              }
            }
          },

EDIT : Maybe you meant a directly usable solution for an end user, and of course it's not, the snippet need to be adapted by a dev.

@paulamei
Copy link

paulamei commented Aug 15, 2023

Indeed

Interesting, but this doesn't return RSS with 'item', 'pubDate' etc. tags. Maybe a script using https://github.com/lkiesow/python-feedgen would do the job?

Not sure I understand ? It expose far more informations than needed and it does expose the date and all

Ok, thank you, I'll try this.

@jcmag
Copy link

jcmag commented Aug 15, 2023

Is there a online/CLI tool converting https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

calling the syndication URL without being logged in twitter doesn't retrieve the most recent tweets. If I call this url in postman, I retrieve 100 tweets from 10/19/2018 to 07/31/2023; no tweets from august...

@null-routed
Copy link

Is there a online/CLI tool converting https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

calling the syndication URL without being logged in twitter doesn't retrieve the most recent tweets. If I call this url in postman, I retrieve 100 tweets from 10/19/2018 to 07/31/2023; no tweets from august...

It retrieves the tweets with the highest like count from that user, which doesnt sound good if your goal is retrieving the most recent tweets, as there's no guarantee new tweets will make it to the top 100 tweets from that user. And even if they did, it might take a considerable amount of time

@yuv418
Copy link

yuv418 commented Aug 15, 2023

I've noticed that for smaller accounts that have less than 100 tweets, that syndication URL does not load any tweets.

@nerra0pos
Copy link

Also, the syndication api for showReplies=true does not work anymore: https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=true but showReplies=false still works, showing the tweets ordered by like count... https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk?showReplies=false

Not really. showReplies=false shows years-old content when not logged in.

That's because in that specific example, those tweets were years ago. Look again at the like count, notice anything?

No. That is the case for all big accounts. I am interested in the most recent Tweets and this approach will lead to nothing.

@kpopdev
Copy link

kpopdev commented Aug 15, 2023

Indeed

Interesting, but this doesn't return RSS with 'item', 'pubDate' etc. tags. Maybe a script using https://github.com/lkiesow/python-feedgen would do the job?

Not sure I understand ? It expose far more informations than needed and it does expose the date and all

Here's an example for one tweet only :

 {
            "type": "tweet",
            "entry_id": "tweet-1519480761749016577",
            "sort_index": "1691455400412446720",
            "content": {
              "tweet": {
                "id": 0,
                "location": "",
                "conversation_id_str": "1519480761749016577",
                "created_at": "Thu Apr 28 00:56:58 +0000 2022",
                "display_text_range": [
                  0,
                  52
                ],
                "entities": {
                  "user_mentions": [],
                  "urls": [],
                  "hashtags": [],
                  "symbols": [],
                  "media": []
                },
                "favorite_count": 4600599,
                "favorited": false,
                "full_text": "Next I’m buying Coca-Cola to put the cocaine back in",
                "id_str": "1519480761749016577",
                "lang": "en",
                "permalink": "/elonmusk/status/1519480761749016577",
                "possibly_sensitive": false,
                "quote_count": 171975,
                "reply_count": 187438,
                "retweet_count": 649833,
                "retweeted": false,
                "text": "Next I’m buying Coca-Cola to put the cocaine back in",
                "user": {
                  "blocking": false,
                  "created_at": "Tue Jun 02 20:12:29 +0000 2009",
                  "default_profile": false,
                  "default_profile_image": false,
                  "description": "Blades of Glory",
                  "entities": {
                    "description": {
                      "urls": []
                    },
                    "url": {}
                  },
                  "fast_followers_count": 0,
                  "favourites_count": 30569,
                  "follow_request_sent": false,
                  "followed_by": false,
                  "followers_count": 153112066,
                  "following": false,
                  "friends_count": 410,
                  "has_custom_timelines": false,
                  "highlightedLabel": {
                    "badge": {
                      "url": "https://pbs.twimg.com/profile_images/1683899100922511378/5lY42eHs_bigger.jpg"
                    },
                    "description": "X",
                    "userLabelType": "BusinessLabel",
                    "userLabelDisplayType": "Badge"
                  },
                  "id": 0,
                  "id_str": "44196397",
                  "is_translator": false,
                  "listed_count": 126597,
                  "location": "𝕏Ð",
                  "media_count": 1659,
                  "name": "Elon Musk",
                  "normal_followers_count": 153112066,
                  "notifications": false,
                  "profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1690621312",
                  "profile_image_url_https": "https://pbs.twimg.com/profile_images/1683325380441128960/yRsRRjGO_normal.jpg",
                  "protected": false,
                  "screen_name": "elonmusk",
                  "show_all_inline_media": false,
                  "statuses_count": 29441,
                  "time_zone": "",
                  "translator_type": "none",
                  "url": "",
                  "utc_offset": 0,
                  "verified": false,
                  "withheld_in_countries": [],
                  "withheld_scope": "",
                  "is_blue_verified": true
                }
              }
            }
          },

EDIT : Maybe you meant a directly usable solution for an end user, and of course it's not, the snippet need to be adapted by a dev.

im using this for my bot and it working fine with cookies and headers.

@intuser
Copy link

intuser commented Aug 15, 2023

Is there a online/CLI tool converting https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk to RSS feed? Then we could individually download the HTML from a logged in profile and do the conversion in a second step.

calling the syndication URL without being logged in twitter doesn't retrieve the most recent tweets. If I call this url in postman, I retrieve 100 tweets from 10/19/2018 to 07/31/2023; no tweets from august...

Yes. And there is at least one tweet from August with more likes (>807K) than some older tweets which are included (e.g. <680K).

@Mr-Freewan
Copy link

is there any forecast for solving this problem?

@ghost
Copy link

ghost commented Aug 15, 2023

Looks like https://nitter.privacydev.net/ is working

@zedeus
Copy link
Owner

zedeus commented Aug 15, 2023

That one is a fork which uses account credentials. See #830

@ghost
Copy link

ghost commented Aug 15, 2023

I am aware but couldn't nitter implement a system that aurora uses with lots of accounts that rotate per user?

@ghost
Copy link

ghost commented Aug 15, 2023

I am aware but couldn't nitter implement a system that aurora uses with lots of accounts that rotate per user?

That's hard to maintain and simple for twitter to ban by just filtering "if number of accounts per IP > SOME_CONSTANT: ban all of them"

@ghost
Copy link

ghost commented Aug 15, 2023

Looks like https://nitter.privacydev.net/ is working

User feeds not working on this

@dawnerd
Copy link

dawnerd commented Aug 15, 2023

I switched to the privacydevel fork, credentials in but its still 404ing the same endpoint upstream is having problems with

@intuser
Copy link

intuser commented Aug 15, 2023

Strange. privacydev (without credentials) works more or less for @ElonMusk, but not for other users like for instance @BarackObama.

@anibalburdo
Copy link

Twstalker is doing funny things: when someone quotes a tweet, the quoted tweet repeats what the quoter says.
image

@robindz
Copy link

robindz commented Mar 11, 2024

@animegrafmays Hey, how/where can I contact you to whitelist my IP for nitter.poast.org :) ?

@animegrafmays
Copy link
Author

im not in the market of 'whitelisting' IPs. you can email me at graf @ poast.org and I can see why yours is blocked, but I'm not whitelisting people

@Trit34
Copy link

Trit34 commented Mar 12, 2024

im not in the market of 'whitelisting' IPs. you can email me at graf @ poast.org and I can see why yours is blocked, but I'm not whitelisting people

@animegrafmays Mine is blocked too, maybe because I use the RSS feeds for some accounts (~ 10). I set them to fetch new posts once per hour on my browsers, instead of every 30 minutes. Before I e-mail you, could you confirm that it could have been a cause of “ERR_CONNECTION_REFUSED”?

@animegrafmays
Copy link
Author

this github issue is not an issue tracker for your connection issues to my nitter instance. please stop shitting up this github issue by triggering an email every single time you respond and instead use the contact methods i've presented in this thread already

@ExperiencersInternational

im not in the market of 'whitelisting' IPs. you can email me at graf @ poast.org and I can see why yours is blocked, but I'm not whitelisting people

@animegrafmays Mine is blocked too, maybe because I use the RSS feeds for some accounts (~ 10). I set them to fetch new posts once per hour on my browsers, instead of every 30 minutes. Before I e-mail you, could you confirm that it could have been a cause of “ERR_CONNECTION_REFUSED”?

I have never had an issue with the RSS feeds lately, I get them fine with Power Automate and Element Feeds.

Fully agree with the fact we are clogging up but it's not like I can move this to fediverse (the servers I am on block poa.st because of allowing some undesirable people that were on certain platforms that's all).

@Yetangitu
Copy link

@animegrafmays Mine is blocked too ...

the servers I am on block poa.st ...

The solution to all your problems is to host these services yourself. One of the common SBC or an old laptop - screen not required - is more than enough to run an instance of Pleroma (for your fediverse needs), Nitter, Libreddit, Invidious (Youtube proxy) and even Peertube if you go easy on the transcodes. That way you are neither a burden to others nor beholden to them. You may get blocked by the 'purple-hair crowd' on Mastodon but that is more or less unavoidable unless you want to become part of that crowd yourself. As long as you behave sensibly - and why wouldn't you? - you should be able to federate with the world and its dog, minus the aforementioned block-happy group.

Of course all this assumes you either have a stable internet connection at home to which you can attach said SBC or laptop or thin client or what have you or can splurge a few ${currency_unit} for a lightweight VPS somewhere. It does not take much in the way of technical prowess to get these things to run, you can even go the way of e.g. Yunohost or CapRover although you'll probably soon get tired of their restrictions and lack of flexibility.

@lukefromdc
Copy link

lukefromdc commented Mar 12, 2024 via email

@ExperiencersInternational

@animegrafmays Mine is blocked too ...

the servers I am on block poa.st ...

The solution to all your problems is to host these services yourself. One of the common SBC or an old laptop - screen not required - is more than enough to run an instance of Pleroma (for your fediverse needs), Nitter, Libreddit, Invidious (Youtube proxy) and even Peertube if you go easy on the transcodes. That way you are neither a burden to others nor beholden to them. You may get blocked by the 'purple-hair crowd' on Mastodon but that is more or less unavoidable unless you want to become part of that crowd yourself. As long as you behave sensibly - and why wouldn't you? - you should be able to federate with the world and its dog, minus the aforementioned block-happy group.

Of course all this assumes you either have a stable internet connection at home to which you can attach said SBC or laptop or thin client or what have you or can splurge a few ${currency_unit} for a lightweight VPS somewhere. It does not take much in the way of technical prowess to get these things to run, you can even go the way of e.g. Yunohost or CapRover although you'll probably soon get tired of their restrictions and lack of flexibility.

Wow, amazing information. If I was able to, I would have done so already. I have no experience with hosting stuff and there's no chance of me running up a cable to connect a device.

Using a VPS seems expensive too, a local company wants £3 for their lowest tier a month.

@lukefromdc how is a landline related to this?

@Yetangitu
Copy link

For those of us with no landline, a need to self-host a service simply to access Twitter content is the same as Twitter not being usable.

If you only would use the machine for TwiXXer->Nitter, yes that would be overkill. The idea would be to use your own host for much more than that both to avoid the mentioned blocking problems as well as to host your own services. Host your own photos instead of giving them to Google, host your own music instead of being dependent on streaming services, host your own XMPP server instead of being dependent on Telegram/Whatsapp/Signal/Matrix or (${deity} forbid) Metafacebook or Apple or Microsoft et al.

@lukefromdc
Copy link

lukefromdc commented Mar 13, 2024

Landline relevence is much more bandwidth available. I have de facto (not admitted to) capped cellular bandwidth via tethering only, when I am on the road it would be out of range of any home server. Forget a static IP address of course.

Also they don't like tethering but blocking sophisticated ways to tether brings lots of false positiives so they don't bother. If they do I get a new carrier and a new phone number. No stable IP address, no continuously on connection in any single location, not going to carry a server everywhere I go and we are not discussing the presumably lighter load of distributed hosting (e.g diaspora) here.

I do not however use Google or Meta for any purpose as I oppose their business models. I publish videos on Mastodon and on archive.org (latter good for linking especially as they want copies of all online content anyway). Twitter used to be widely used in my community so I had to be able to keyword search it and pull up known useful timelines, now neither of those work on the Twitter webapp without an account. I refuse to make an account with any ad supported corporate social media site, but the importance of Twitter is plunging as people stop posting there.

Unfortunately this is fueling a migration to Instagram which is even worse than Twitter but at least still has a partially working 3ed party backend. You must know at least the username whose content you want to see to use it as far as I can tell. On the plus side, the migration to Mastodon has greatly boosted the reach of my content, but some events I would cover now go unseen as they are organized in places I am now blind to.

@ExperiencersInternational

Landline relevence is much more bandwidth available. I have de facto (not admitted to) capped cellular bandwidth via tethering only, when I am on the road it would be out of range of any home server. Forget a static IP address of course.

Also they don't like tethering but blocking sophisticated ways to tether brings lots of false positiives so they don't bother. If they do I get a new carrier and a new phone number. No stable IP address, no continuously on connection in any single location, not going to carry a server everywhere I go and we are not discussing the presumably lighter load of distributed hosting (e.g diaspora) here.

I do not however use Google or Meta for any purpose as I oppose their business models. I publish videos on Mastodon and on archive.org (latter good for linking especially as they want copies of all online content anyway). Twitter used to be widely used in my community so I had to be able to keyword search it and pull up known useful timelines, now neither of those work on the Twitter webapp without an account. I refuse to make an account with any ad supported corporate social media site, but the importance of Twitter is plunging as people stop posting there.

Unfortunately this is fueling a migration to Instagram which is even worse than Twitter but at least still has a partially working 3ed party backend. You must know at least the username whose content you want to see to use it as far as I can tell. On the plus side, the migration to Mastodon has greatly boosted the reach of my content, but some events I would cover now go unseen as they are organized in places I am now blind to.

Ah that makes sense.

Providers here generally don't require a landline but one provider still only does ADSL service and the other is HFC.

@animegrafmays
Copy link
Author

so people stop generating multiple notifications via here and email for stuff that isn't related to nitter development, if it pertains to my nitter instance please email graf at poast dot org. do not post it here. want to be critical of me? email it. have a problem with what im doing? email it. leave this thread for dev.

thanks

@patchhg
Copy link

patchhg commented May 3, 2024

@TempUser13 Glad you asked :) I was working on it exactly and had it here https://github.com/sekai-soft/freebird
I was able to run a private Nitter and miniflux instance locally on my NAS and had been using it for a while without trouble.

this doesn't help anybody at all, it's just a way for you to self flagellate

@TempUser13 there are guides here but utilizing your own account is super easy using a python script in here (sorry to the author i dont care to look for the source now it's literally buried now)

basically:

  1. checkout the guest accounts branch of nitter
  2. make a file in the root directory with guest_accounts.jsonl. as the filename
  3. use [one of these scripts](looks like X/twitter(?) broke something again #983 #issuecomment-1944095658) to generate the contents of guest_accounts.jsonl
  4. do not EVER give people like @KTachibanaM anything including money

Cheers for all the work that you do on Poast bud @animegrafmays

Apologies in advance for the plethora of questions but I would appreciate some further tips before I start smashing my head against a wall trying to get a private instance working well enough for my needs. Given your experience running the most reliable Nitter instance as of late I would be very grateful for any advice around accounts usage and I'm sure I'm not the only one.

Basically I need to scrape a few hundred accounts, their complete timelines including likes, complete list of followers and following. What would be the best way to go about it? Would it be more effective to whip up some quick scripts that query graphql rather than run a private nitter instance?

I'm not as concerned about performance gains as I am with how nitter cycles through the accounts. I wonder if e.g. requesting the timelines in order with sane delays through some custom scripts would help as far as not getting the accounts blacklisted as much.

I've also thought about queuing up all the tweets containing media and then spinning up a headless browser separately in order to request them directly from twitter, since you can seemingly still do this much unauthenticated. Or is this overkill?

You've mentioned aged accounts faring better but most people, myself included, don't have access to any. Although the scale of private instances is infinitely smaller than the one you run, I'm still wondering what measures I could take to optimize fresh accounts usage.
Would running each account through a separate fixed residential proxy make a big difference or is it not worth the hassle?
Similarly, would using really big delays make a difference as far as not getting the accounts blocked in any way? For my particular use case, performance is not as much of a concern as getting full coverage. I remember reading in another thread that someone running a private instance for testing was noticing several accounts not retrieving full timelines in some cases, with random tweets missing. That would be a big problem and something I'm looking to avoid at all costs.

Thank you in advance for any insight which will result in less hair pulling on my part.

@animegrafmays
Copy link
Author

Basically I need to scrape a few hundred accounts, their complete timelines including likes, complete list of followers and following.

i run private instances for this so they don't have impact on the public instance, you can reach out to me at graf [at] poast.org if you are interested.

Would it be more effective to whip up some quick scripts that query graphql rather than run a private nitter instance?

for what you are asking, if you are doing a one-time thing, running a quick nitter with some even new-ish accounts would be fine. the limits for accounts older than I think 2018 are doubled whereask those prior to 2012 seem to be tripled or maybe quadrupled. i can't find documentation on this but in practice we run 200 accounts older than 2012 and very rarely run into limited accounts beyond maybe a dozen out of that in a day. older account limits are much, much more lax and definitely the way to go if you can get your hands on them

I'm not as concerned about performance gains as I am with how nitter cycles through the accounts. I wonder if e.g. requesting the timelines in order with sane delays through some custom scripts would help as far as not getting the accounts blacklisted as much.

if you are using RSS it's limited to the cache time of the nitter instance. nitter.poast.org is on 30 minute refresh for rss, staggering requests on a private instance (i.e. request 5 accounts now, 5 more in 5 minutes, etc) would help spread out user account exhaustion but in my experience as I said prior you can more or less ignore this with aged accounts. banned accounts have limits almost identical to those created in the last 12 months so if you can get your hands on a bunch of tokens from those and you're just serving yourself/your own interests its more than adequate. it's a lot easier to find banned/suspended accounts from people than current, active ones

I've also thought about queuing up all the tweets containing media and then spinning up a headless browser separately in order to request them directly from twitter, since you can seemingly still do this much unauthenticated. Or is this overkill?

you mean fetching a list of attachments on the tweet and bulk downloading? seems a bit overkill. if you are running a private nitter instance you should be fine without needing to go to this extreme

You've mentioned aged accounts faring better but most people, myself included, don't have access to any. Although the scale of private instances is infinitely smaller than the one you run

we happen to have a relatively large userbase (32k users at time of writing) who had donated a bunch of accounts and I have some that I had left over. I also have some surplus I could give you tokens for if you're using a private instance for yourself to get you started. we have 22-25k requests/s avg (just recently had to move it to a ryzen 7 server because the core clock on individual threads wasn't enough even running multiple nitter processes)

I'm still wondering what measures I could take to optimize fresh accounts usage.

to be honest, as long as you aren't listing yourself on the wiki or on the status page you likely wont run into issues. new accounts have about ~200 profile queries each that seem to reset between 8-12 hours, so a private instance with maybe 5 or 6 accounts would do you with staggered scraping for quite some time

i dont want to clog up this issue or notify everybody so if you'd like to discuss it further you can email me at the address above

@garoto
Copy link

garoto commented May 3, 2024

Why are you even engaging in a convo with someone that "needs to scrape a few hundred accounts" and then offering to help with his endeavour? I must be taking crazy pills without knowing.

@nukeop
Copy link

nukeop commented May 3, 2024

Twitter's fault for not providing an API for this. There's nothing wrong with scraping.

@patchhg
Copy link

patchhg commented May 4, 2024

Why are you even engaging in a convo with someone that "needs to scrape a few hundred accounts" and then offering to help with his endeavour? I must be taking crazy pills without knowing.

I'm not going to take your bait and get bogged down in a pointless debate re the ethics of scraping and how it all might pertain to Twitter's newest policies.

From a pragmatic standpoint, several Nitter instance maintainers have mentioned in this very thread that scraping is a big enough problem as far as keeping Nitter usable for regular users. Since I prefer to use Nitter myself for casual lurking rather than having to deal with Twitter's interface and algo, the last thing I wanted to do was contribute to the problem.

I believed addressing this out in the open would go a long way towards helping those with this specific need get their own private instances up and running, rather than ruin a good thing for everyone. Taking my own use case as an example, since speed is not a concern it would've certainly been more convenient to scrape graf's instance for weeks rather than have to deal with Twitter socks, but it's a really shitty thing to do.

As for why someone might be interested in scraping Twitter on a small scale - several hundred accounts is nothing really despite your scaremongering - there are legitimate reasons. From Twitter's shitty built-in search to archival purposes; some communities sharing precious tidbits on Twitter are already more or less shutting down in protest because Elon man bad.

@animegrafmays
Copy link
Author

I'm not going to take your bait and get bogged down in a pointless debate
but you're here arguing.

i'll break it down for you -- i'd rather somebody scrape an instance i run that nobody else is trying to use than use my main instance, that literally everybody is linking to. cool with you? no? i dont give a shit

@stopmotio
Copy link

oh hey, emails for that Nitter issue are back! I wonder if they ever fixed the probl—

Zki6LEk.gif

@Apachez-
Copy link

Apachez- commented May 5, 2024

Twitter's fault for not providing an API for this. There's nothing wrong with scraping.

Scraping is why we cant have nice things such as Nitter in the public as it used to work before "some" started to scrap Twitter through Nitter and put Nitter to a halt.

@nkfm200
Copy link

nkfm200 commented May 22, 2024

nitter.poast.org sometimes doesn't show search results or individual Tweets. nitter.privacydev.net shows the rate limit error on all searches and account pages even with a full bar of green at the Nitter status. It'd be great if people started working on a solution as opposed to dooming every moment about the death of Nitter 😐

@animegrafmays
Copy link
Author

the reason searches are "sometimes" not shown is due to the amount of accounts limited. there's currently 200 accounts on that instance and because it's serving traffic to almost 200 thousand unique visitors daily more than half of them are rate limited. with that amount of traffic having more than half the accounts rate limited will result in not being able to expand tweets, searches not functioning etc. it's a literal full time job maintaining it and preventing scraping but I am doing what I can including procuring an additional 100 to add to it this week. sorry I can't be good enough for you, @nkfm200 😭😭😭

@stopmotio
Copy link

I continue to be amazed that you've got this running so long after development ceased

@nkfm200
Copy link

nkfm200 commented May 22, 2024

@animegrafmays That I've heard of, the small number of accounts serving a huge number of visitors every day causing rate limits on accounts. Hopefully that extra 100 accounts is there at the time when it's the end of the week. This may be off-topic but I'm just strict about stuff because this is a world where the people living in it refuse to do something unless it fits the rules of the culture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests