Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix user.videos crawling issue #1141

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

lhphat02
Copy link

@lhphat02 lhphat02 commented Apr 13, 2024

Issue: Can't crawl videos from user.videos more than 35 videos (even if using cursor).

The code I used for testing:

async def get_user_videos(username):
    start_time = time.time()
    row_count = 0

    async with TikTokApi() as api:
        await api.create_sessions(headless=False, ms_tokens=[ms_token], num_sessions=1, sleep_after=3)
        user = api.user(username)
        user_data = await user.info()
        post_count = user_data["userInfo"]["stats"].get("videoCount")

        async for video in user.videos(count=post_count):
            url = f"https://www.tiktok.com/@{video.as_dict['author']['uniqueId']}/video/{video.id}"
            print(f"URL: {url}") 
            row_count += 1

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Execution time: {elapsed_time} seconds")
    print(f"Total rows: {row_count}")
    print(f"Rows per second: {row_count / elapsed_time}")

Before modifying the videos method:

2024-04-13 14:20:04,220 - TikTokApi.tiktok - ERROR - Got an unexpected status code: {'log_pb': {'impr_id': '202404130720035F6460F88AF0BF0E31DB'}, 'statusCode': 10201, 'statusMsg': '', 'status_code': 10201, 'status_msg': ''}
Execution time: 11.261611223220825 seconds
Total rows: 0
Rows per second: 0.0

After modifying the videos method:

URL: https://www.tiktok.com/@sofm_official/video/6817297421245107457
URL: https://www.tiktok.com/@sofm_official/video/6815619623837289729
...
URL: https://www.tiktok.com/@sofm_official/video/6815419939957017857
URL: https://www.tiktok.com/@sofm_official/video/6815113300146228481
URL: https://www.tiktok.com/@sofm_official/video/6814374629558258945
Execution time: 14.023724794387817 seconds
Total rows: 135
Rows per second: 9.626543730666045

Please check this

@anarchopythonista
Copy link

Applying this patch locally fixed the issue I was experiencing with the 6.3.0 release. Thank you!

@mi01
Copy link

mi01 commented Apr 29, 2024

This fix is breaking the count parameter, since the function will always return multiples of 35 (or less if the number of videos is smaller). But it might work if we add an additional break statement in this loop:

for video in resp.get("itemList", []):
yield self.parent.video(data=video)
found += 1

for video in resp.get("itemList", []):
    yield self.parent.video(data=video)
    found += 1
    if found == count:
        break

Still the cursor parameter is useless and confusing for the user of this function.

@davidteather davidteather self-requested a review April 29, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants