Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not scraping every tweet from a user #52

Open
wjd157 opened this issue Nov 28, 2022 · 6 comments
Open

Not scraping every tweet from a user #52

wjd157 opened this issue Nov 28, 2022 · 6 comments

Comments

@wjd157
Copy link

wjd157 commented Nov 28, 2022

Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.

My code is below:

scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")

(Note that @ElonMusk is just a stand-in example)

@shaikhsajid1111
Copy link
Owner

shaikhsajid1111 commented Dec 3, 2022

Hey @wjd157, that method uses browser automation for scraping and your tweet count is big so it might be getting blocked in between. I suggest you use the scrape_keyword_with_api() method for scraping. Try the below code, and check elon.json after scraping you will get the data you want

from twitter_scraper_selenium import scrape_keyword_with_api

scrape_keyword_with_api('from:elonmusk', output_filename='elon')

@wjd157
Copy link
Author

wjd157 commented Dec 13, 2022

This appears to generate a JSON file with no data in it. Further, it the console tells me I have only scraped 24 tweets even though the account I am now trying has more than 200 tweets.

@shaikhsajid1111
Copy link
Owner

Okay, I think this feature of Twitter only returns few tweets. Currently, I have not added feature to scrape Twitter account from Twitter's API, and the one with the browser automation get's blocked. I will add a new feature to scrape Twitter's profile from the API in a couple of weeks

@christianmettri
Copy link

I am also highly looking forward to this feature. Please let us know once you had time to implement this. Thanks a lot.

@shaikhsajid1111
Copy link
Owner

Hi @christianmettri @wjd157 , Just updating you about it, don't know if you're still looking for the solution. Now, you can try

from twitter_scraper_selenium import scrape_profile_with_api

scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)

and check musk.json file where the output will be saved

@SenninOne
Copy link

SenninOne commented Feb 28, 2023

Hello @shaikhsajid1111 I tried this code and it gives me this error:

2023-02-28 02:33:09,836 - WARNING - Failed to make request!

The code:

from twitter_scraper_selenium import scrape_profile_with_api
import json

scrape_profile_with_api(username="NASA", output_filename="NASA", browser="firefox",tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")


with open('NASA.json') as f:
    NASA = json.load(f)


with open('NASAimages.html', 'w') as f:
    f.write('<html>\n')
    f.write('<head>\n')
    f.write('<title>Imágenes</title>\n')
    f.write('</head>\n')
    f.write('<body>\n')
    for tweet_id, tweet_data in caro.items():
        if tweet_data['username'] == 'NASA':
            for imagen in tweet_data['images']:
                f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
    f.write('</body>\n')
    f.write('</html>\n')

print("HTML READY")

I also tried with the function scrape_keyword_with_api, here is the code:


from twitter_scraper_selenium import scrape_keyword_with_api
import json

scrape_keyword_with_api(query="from:NASA", output_filename="NASA", tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")


with open('NASA.json') as f:
    NASA = json.load(f)


with open('imagenes.html', 'w') as f:
    f.write('<html>\n')
    f.write('<head>\n')
    f.write('<title>Imágenes</title>\n')
    f.write('</head>\n')
    f.write('<body>\n')
    for tweet_id, tweet_data in NASA.items():
        if tweet_data['username'] == 'NASA':
            for imagen in tweet_data['images']:
                f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
    f.write('</body>\n')
    f.write('</html>\n')

print("HTML READY")

It shows this error:

2023-02-28 02:37:18,021 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants