Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the future of snscrape #1037

Open
lukaszpeee opened this issue Nov 19, 2023 · 13 comments
Open

Question about the future of snscrape #1037

lukaszpeee opened this issue Nov 19, 2023 · 13 comments
Labels
question Further information is requested

Comments

@lukaszpeee
Copy link

Hi,

I think snscrape is an amazing library that offers numerous possibilities. I have a question about the current situation because it has been lasting for a few months. I would like to use snscrape for my projects, but now it's crashed. Do you think there is any chance that the situation will change?

Best regards

@lukaszpeee lukaszpeee added the question Further information is requested label Nov 19, 2023
@JustAnotherArchivist
Copy link
Owner

Yeah, I haven't had enough spare time recently to play whack-a-mole with Elon's minions. I do intend to resume development, but I can't currently say when that will happen.

@JustAnotherArchivist JustAnotherArchivist changed the title Question about the feature Question about the future of snscrape Nov 24, 2023
@Krishna-Singhal
Copy link

Yeah, I haven't had enough spare time recently to play whack-a-mole with Elon's minions. I do intend to resume development, but I can't currently say when that will happen.

Could you please revive the amazing lib for twitter (or x) ; )

@Evandro72
Copy link

Good luck my friend!

@Dev-Anky07
Copy link

@JustAnotherArchivist I can help you with the automated login using a headless browser instance, which can then be bypassed by passing the browser profile so that the passwords get saved

options = webdriver.ChromeOptions() 
options.add_argument("user-data-dir=C:\\Path") #Path to your chrome profile
w = webdriver.Chrome(executable_path="C:\\Users\\chromedriver.exe", chrome_options=options)

Although the bypass was for single user application and I don't know how you'll be able to implement it at this scale.

One way would be to ask the user for their login details which will then be used to authenticate the automated login.

I used selenium and the sendKeys() function.

lemme know if that's something you'd like, I just want to help as much as I can

@TheTechRobo
Copy link
Contributor

@Dev-Anky07 Authentication won't be supported in snscrape: #270

@lukaszpeee

This comment was marked as spam.

@leleobhz
Copy link

@Dev-Anky07 Authentication won't be supported in snscrape: #270

Hello!

I think #270 needs to be reviewed at least for Twitter, since API (and paying it) is the recognized and official way to scrap twitter. And new Twitter positioning about scraping will require at least follow their api guidelines.

@JustAnotherArchivist
Copy link
Owner

It does not. If you want to use the API, there are several API clients already. Also, regular use of an official API isn't scraping.

@leleobhz
Copy link

It does not. If you want to use the API, there are several API clients already. Also, regular use of an official API isn't scraping.

The way it is today SNScrape cannot scrape notting from Twitter - and I doubt it will can while Musk own this network. Also, idea here is allow user to have options since massive access to tweets can be reached with API. I'm not using snscrape as a client, but as a specific-terms scrap - and I think this will be very useful, counting the number of forks of sncrape just for support twitter auth API.

@JustAnotherArchivist
Copy link
Owner

You seem to misunderstand what snscrape is. It's a scraper, not an API client. And more specifically, it's for scraping publicly accessible content. Anything behind authentication walls has always been outside of snscrape's scope and design goal. If people want to maintain a fork going beyond that scope, they can do that (so long as they comply with GPLv3+). It might be useful to them. It's not something I will entertain though.

When I started writing snscrape, there was no usable software for scraping Twitter. There were and are usable API clients, and I'm not going to reinvent the wheel and write another one. Again, please use one of those many existing ones if you want to use the API.

snscrape can't scrape Twitter anymore, and the best thing it might do in the foreseeable future is retrieving individual tweets (useful for hydrating tweet ID lists, although it won't work for age-restricted or protected tweets) and a profile's most popular tweets. Those are the only things that are still publicly accessible as far as I know. I will likely remove all other Twitter scrapers.

@leleobhz
Copy link

You seem to misunderstand what snscrape is. It's a scraper, not an API client. And more specifically, it's for scraping publicly accessible content. Anything behind authentication walls has always been outside of snscrape's scope and design goal. If people want to maintain a fork going beyond that scope, they can do that (so long as they comply with GPLv3+). It might be useful to them. It's not something I will entertain though.

When I started writing snscrape, there was no usable software for scraping Twitter. There were and are usable API clients, and I'm not going to reinvent the wheel and write another one. Again, please use one of those many existing ones if you want to use the API.

snscrape can't scrape Twitter anymore, and the best thing it might do in the foreseeable future is retrieving individual tweets (useful for hydrating tweet ID lists, although it won't work for age-restricted or protected tweets) and a profile's most popular tweets. Those are the only things that are still publicly accessible as far as I know. I will likely remove all other Twitter scrapers.

I saw a nitter-based scrapper that works - within their VERY limitations and API questions, maybe something can be used from nitter too. I understand twitter data is public because the restriction is not about who access, but just have any "official" way to reach it. Its different than Facebook - as example - that allow user to decide what is public or not. On twitter, everything is available except for private accounts - as it always was. Musk want to vanish robots, crawlers, scrapers, etc from Twitter but this does not change the way information is handled by twitter.

In this way, I think twitter scrapers deserve attention since twitter still relevant for public information and debate. I understand you question about API (And I agree with you prerogatives) but also just deprecate is since there is no known neither easy way to access it can do the opposite scrapers always wanted to do.

@JustAnotherArchivist
Copy link
Owner

Nitter only works with accounts now, as far as I'm aware. It previously used guest tokens, but those can't be generated anymore since late January, and the last ones expired a few days ago.

@DevanshD3
Copy link

Hey @JustAnotherArchivist , I just wanted to scrape some tweets of a few accounts, my dad wanted some tweets and he was manually copy pasting, being a fresh CS Grad I had to intervene, but this thread made me sad. Can we use snscrape to do that, or that capability is also unavailable ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants