Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape not working? #163

Open
JarJarBeatyourattitude opened this issue Apr 21, 2023 · 16 comments
Open

Scrape not working? #163

JarJarBeatyourattitude opened this issue Apr 21, 2023 · 16 comments

Comments

@JarJarBeatyourattitude
Copy link

I wasn't getting any results from scrape, so I tried with headless=False. I noticed that search wasn't returning any results, I assume since you need an account to search. I confirmed that the links work in my browser where I'm signed in. Will the script be fixed, or am I missing something? Thanks.

@fjj-088
Copy link

fjj-088 commented Apr 24, 2023

I also encountered the same problem.

@BradKML
Copy link

BradKML commented Apr 27, 2023

Is the same thing happening to other scrapers? Might want to keep an eye.

@NicerWang
Copy link

It's twitter's new restriction, now you need to login before searching.

  1. call utils.init_driver to get a driver
  2. call utils.log_in to login
  3. pass driver to scrape()
    (Need to modify scrape() in scweet.py to use passed driver instead of init a new one)

@yisyed
Copy link

yisyed commented Apr 29, 2023

It's twitter's new restriction, now you need to login before searching.

1. call utils.init_driver to get a `driver`

2. call utils.log_in to login

3. pass `driver` to scrape()
   (**Need to modify [scrape() in scweet.py](https://github.com/Altimis/Scweet/blob/76e7086a725980dbd5cf8d46bfc27bd4c1d6816f/Scweet/scweet.py#L71)** to use passed `driver` instead of init a new one)

Can you explain a bit more on how and what are we supposed to change.

@NicerWang
Copy link

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

@yisyed
Copy link

yisyed commented Apr 30, 2023

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

It works! Thanks.

@MykhailoYampolskyi
Copy link

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

@yisyed
Copy link

yisyed commented May 2, 2023

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "_example@email.com_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

  1. I have added 'env=".env"'
    data = scrape(..., env=".env")

  2. In scrape() of 'scweet.py':

def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

@yisyed
Copy link

yisyed commented May 2, 2023

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "_example@email.com_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

1. I have added 'env=".env"'
   `data = scrape(..., env=".env")`

2. In scrape() of 'scweet.py':
def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

In scrape() of 'scweet.py':
Edit this import in Line (9) and add 'log_in'
from .utils import ..., log_in

@Wish-s
Copy link

Wish-s commented May 7, 2023

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

@yisyed
Copy link

yisyed commented May 7, 2023

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

Try following the method I have given above. It works for me.
I have kept everything the same in scrap() of scweet.py on line (71) (the proxy is 'None' by default).
If it still doesn't work, let me know what's the error. Thanks.

Note: I have to restart my VScode every time I make a change in the Scweet library.

@NicerWang
Copy link

@Wish-s
If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

@Wish-s
Copy link

Wish-s commented May 9, 2023

@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

Thank you for your reply. I need a a proxy(or VPN) to connect to twitter.com, but I can't find where to obtain the parameter.

@NicerWang
Copy link

NicerWang commented May 9, 2023

@Wish-s
It's decided by your proxy software, in the format "PROTOCOL://IP:PORT".
For clash, it use "http://127.0.0.1:7890" as default.

@ihabpalamino
Copy link

hello guy this is my code from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from Scweet.scweet import scrape

Specify the parameters for scraping

username = "2MInteractive"
since_date = "2023-07-01"
until_date = "2023-07-11"
headless = True

Set up the ChromeDriver service

service = Service("C:/Users/HP Probook/Downloads/chromedriver.exe") # Replace with the actual path to chromedriver

Set up the ChromeOptions

options = webdriver.ChromeOptions()
options.headless = headless

Create the WebDriver

driver = webdriver.Chrome(service=service, options=options)

Scrape the tweets by username

data = scrape(from_account=username, since=since_date, until=until_date, headless=headless, driver=driver)

Print the scraped data

print(data)

Close the WebDriver

driver.quit()
and i am having empty datalist looking for tweets between 2023-07-01 and 2023-07-06 ...
path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-06%20since%3A2023-07-01%20&src=typed_query
scroll 1
scroll 2
looking for tweets between 2023-07-06 and 2023-07-11 ...
path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-11%20since%3A2023-07-06%20&src=typed_query
scroll 1
scroll 2
Empty DataFrame
Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL]
Index: []

@baqachadil
Copy link

baqachadil commented Jul 18, 2023

check this solution, it might work if none of the others worked #169 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants