Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient collection of large list of screen-names/ids via Twitter API #7

Open
computermacgyver opened this issue Aug 14, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@computermacgyver
Copy link
Member

Currently the infer_screen_name and infer_id methods in M3Twitter accept one screen-name/id and call the Twitter API to get information for that single user. This is inefficient since the endpoint can get up to 100 users at a time.

New methods should be included in the M3Twitter class to handle a long list of users. These methods should break the list into chunks of 100, respect the rate limit, and gracefully handle any API errors.

(This was previously not needed as the class was scraping profiles from HTML and was designed simply as a demonstration method rather than something to be used at scale. The change recently made to use the API opens up this opportunity, which would make the library even more user-friendly)

@computermacgyver computermacgyver added the enhancement New feature or request label Aug 14, 2020
@computermacgyver computermacgyver self-assigned this Aug 14, 2020
@JanaLasser
Copy link

Any updates on this enhancement, or ideas for a workaround? I am very interested in getting this to work. If you could give me a pointer on where to start, I could potentially implement it.

@computermacgyver
Copy link
Member Author

Hi @JanaLasser . We haven't done this work.

We would need to first write a function that takes a list of user ids or screen names and checks them with the
Twitter API using the /1.1/users/lookup.json end point. This is documented here:
https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/follow-search-get-users/api-reference/get-users-lookup

It accepts up to 100 users at a time.

After that we would download the profile images and then transform the data to be ready for processing. Functions for these exist but are single threaded; so, may be slow. I would leave them for now, however, and focus on the first step of using the /users/lookup.json endpoint.

@JanaLasser
Copy link

I created a pull request (#30) where I implemented the changes. I hope this is the right way (first time ever pull request...).

So far there is only code for user ID lists (not user name lists). The code does handle lists with >100 IDs by chunking them into bits of 100 IDs each.

It also doesn't explicitly respects the API rate limit and will fail with an "Invalid response from Twitter" if the rate limit is exceeded (similar to the single user lookup).

@zijwang
Copy link
Member

zijwang commented Mar 28, 2022

Thank you @JanaLasser for the PR. It looks nice and I left a few comments there. I do not have a set of API keys handy -- it would be fantastic if @computermacgyver could help test when these comments were resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants