Efficient collection of large list of screen-names/ids via Twitter API #7

computermacgyver · 2020-08-14T21:37:18Z

Currently the infer_screen_name and infer_id methods in M3Twitter accept one screen-name/id and call the Twitter API to get information for that single user. This is inefficient since the endpoint can get up to 100 users at a time.

New methods should be included in the M3Twitter class to handle a long list of users. These methods should break the list into chunks of 100, respect the rate limit, and gracefully handle any API errors.

(This was previously not needed as the class was scraping profiles from HTML and was designed simply as a demonstration method rather than something to be used at scale. The change recently made to use the API opens up this opportunity, which would make the library even more user-friendly)

The text was updated successfully, but these errors were encountered:

JanaLasser · 2022-03-25T15:08:53Z

Any updates on this enhancement, or ideas for a workaround? I am very interested in getting this to work. If you could give me a pointer on where to start, I could potentially implement it.

computermacgyver · 2022-03-25T16:15:06Z

Hi @JanaLasser . We haven't done this work.

We would need to first write a function that takes a list of user ids or screen names and checks them with the
Twitter API using the /1.1/users/lookup.json end point. This is documented here:
https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/follow-search-get-users/api-reference/get-users-lookup

It accepts up to 100 users at a time.

After that we would download the profile images and then transform the data to be ready for processing. Functions for these exist but are single threaded; so, may be slow. I would leave them for now, however, and focus on the first step of using the /users/lookup.json endpoint.

JanaLasser · 2022-03-25T17:07:27Z

I created a pull request (#30) where I implemented the changes. I hope this is the right way (first time ever pull request...).

So far there is only code for user ID lists (not user name lists). The code does handle lists with >100 IDs by chunking them into bits of 100 IDs each.

It also doesn't explicitly respects the API rate limit and will fail with an "Invalid response from Twitter" if the rate limit is exceeded (similar to the single user lookup).

zijwang · 2022-03-28T01:36:33Z

Thank you @JanaLasser for the PR. It looks nice and I left a few comments there. I do not have a set of API keys handy -- it would be fantastic if @computermacgyver could help test when these comments were resolved.

computermacgyver added the enhancement New feature or request label Aug 14, 2020

computermacgyver self-assigned this Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient collection of large list of screen-names/ids via Twitter API #7

Efficient collection of large list of screen-names/ids via Twitter API #7

computermacgyver commented Aug 14, 2020

JanaLasser commented Mar 25, 2022

computermacgyver commented Mar 25, 2022

JanaLasser commented Mar 25, 2022

zijwang commented Mar 28, 2022

Efficient collection of large list of screen-names/ids via Twitter API #7

Efficient collection of large list of screen-names/ids via Twitter API #7

Comments

computermacgyver commented Aug 14, 2020

JanaLasser commented Mar 25, 2022

computermacgyver commented Mar 25, 2022

JanaLasser commented Mar 25, 2022

zijwang commented Mar 28, 2022