Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Response Time - Get Profile #63

Open
0xAskar opened this issue Sep 27, 2023 · 4 comments
Open

Long Response Time - Get Profile #63

0xAskar opened this issue Sep 27, 2023 · 4 comments

Comments

@0xAskar
Copy link

0xAskar commented Sep 27, 2023

As explained by the title, over the past 24hrs, the getProfile now takes a long time to respond. It doesn't error out, but responds after 5-10 minutes, if not longer. This was working fine 48hrs ago and there have been no changes to my authentication.
I will also add my scraper code below, any help would be greatly appreciated because my system highly depends on the scraper working well. Also, this is happening on my local network and on my heroku servers, so its not a network issue. I've also changed the cookies locally to some new ones from the browser, but with no avail.

I also checked the params set and it includes the usernames. Sometimes it takes 24 seconds. othertime it never responds after 10 minutes. Also, the url is still consistent with the one that shows in the network browser.
https://twitter.com/i/api/graphql/G3KGOASz96M-Qu0nwmGXNg/UserByScreenName?variables

As you can see with the code below, it logs this response: finishing getting scraper info and it took 24.2456 seconds which is the quick time haha

                console.log("getting scraper info")
                let startTime = new Date().getTime()
                twitterData = await scraper.getProfile(user.twitterUsername)
                let endTime = new Date().getTime()
                console.log(`finishing getting scraper info and it took ${(endTime - startTime) / 1000} seconds`)

the scraper code with cookies emitted

import dotenv from 'dotenv';

dotenv.config({ path: '../.env'});
import { HttpsProxyAgent } from 'https-proxy-agent';
import { Scraper } from '@the-convocation/twitter-scraper'

export default async function getScraper(options = { authMethod: 'cookies' }) {
    // const username = process.env['TWITTER_USERNAME'];
    const username = omitted
    // const password = process.env['TWITTER_PASSWORD'];
    const password = omitted
    // const email = process.env['TWITTER_EMAIL'];
    const email = "omitted
    let cookies = [
      {"name": "lang", "value": "en"},
      {"name": "guest_id", "value": "omitted"},
      {"name": "_twitter_sess", "value": "omitted"},
      {"name": "auth_token", "value": "omitted"},
      {"name": "ct0", "value": "omitted"},
      {"name": "guest_id_ads", "value": "omitted"},
      {"name": "guest_id_marketing", "value": "omitted"},
      {"name": "twid", "value": "omitted"},
      {"name": "personalization_id", "value": "omitted"}
    ]
    const proxyUrl = null;
    let agent;
  
    if (options.authMethod === 'cookies' && !cookies) {
      console.warn(
        'TWITTER_COOKIES variable is not defined, reverting to password auth (not recommended)',
      );
      options.authMethod = 'password';
    }
  
    if (options.authMethod === 'password' && !(username && password)) {
      throw new Error(
        'TWITTER_USERNAME and TWITTER_PASSWORD variables must be defined.',
      );
    }
  
    if (proxyUrl) {
      agent = new HttpsProxyAgent(proxyUrl, {
        rejectUnauthorized: false,
      });
    }
  
    const scraper = new Scraper({
      transform: {
        request: (input, init) => {
          if (agent) {
            return [input, { ...init, agent }];
          }
          return [input, init];
        },
      },
    });
  
    if (options.authMethod === 'password') {
      await scraper.login(username, password, email);
    } else if (options.authMethod === 'cookies') {
        const cookieStrings = cookies.map(cookie => `${cookie.name}=${cookie.value}`);
        await scraper.setCookies(cookieStrings);
    }
  
    return scraper;
}
@0xAskar
Copy link
Author

0xAskar commented Sep 27, 2023

I created a new account and got new credentials. the problem is that its obviously unsustainable to do that manually each time. anyone have any solutions?

@karashiiro
Copy link
Collaborator

The tests still complete in the same time as a couple weeks ago, and given that getting new credentials at least temporarily fixed the problem, I think you're just getting rate limited - there's nothing this library can do about that to the best of my knowledge, but if you find something it can do here I'd be happy to add it. Twitter's servers are effectively a black box to us, so while they may have changed something recently, it's just as likely that the account you're using was flagged and has a stricter rate limit now (maybe? I don't know if that's actually a thing).

Short of that, I'd suggest implementing a secondary throttler on your end. If your application is making rapid-fire requests until it gets rate-limited then possibly space out requests from each other? I don't know what the safest/most efficient way of doing that is but it'll probably be specific to your application.

@0xAskar
Copy link
Author

0xAskar commented Sep 28, 2023

@karashiiro Hmm, yeah makes sense. I figured that rate-limiting was also the reason, and I couldn't think of a good way to go about it. For my specific use case, time sensitivity is important. I wonder if there were ways to create twitter accounts and retrieve their cookies automatically, making a new account every time we reach that threshold (I've been querying a lot tbf). I'll keep this open for a little longer, and close it whether I think of a better approach or not

@karashiiro
Copy link
Collaborator

karashiiro commented Sep 28, 2023

Automated account creation is a challenge because of the email (and often phone) verification requirements, but it might be possible with a sophisticated enough system. With the current requirements, even creating an account manually is a chore, though.

In a different vein, you might be able to load-balance across multiple scrapers logged-into different accounts, but that also might get flagged more easily unless you proxy them all through different servers to avoid them all having the exact same IP (of course, then you might run into the login location check verification).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants