Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self.driver = driver_klass(**driver_kwargs) TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path' #128

Open
ahmedraxa23 opened this issue Jun 20, 2023 · 17 comments

Comments

@ahmedraxa23
Copy link

Chrome driver

@EdgarGc026
Copy link

I have the same issue:

Unhandled error in Deferred:

Traceback (most recent call last):
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 240, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 244, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\Edgar\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
builtins.TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

I use chromedriver on windows.

@oamer1
Copy link

oamer1 commented Jun 24, 2023

Selenium 4 executable_path is depreciated and Service() is used instead.
Install selenium 3 to solve.

pip install 'selenium<4'

@Shah13079
Copy link

Same problem?

@ton77v
Copy link

ton77v commented Jul 3, 2023

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

@Shah13079
Copy link

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

But How ton77v? Do you mean to use selenium integration in spider class ?

@ton77v
Copy link

ton77v commented Jul 3, 2023

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

But How ton77v? Do you mean to use selenium integration in spider class ?

I mean something like this:

  1. https://github.com/clemfromspace/scrapy-selenium/fork
  2. Clone in your IDE and modify similarly like I did for myself for example 5c3fe7b
  3. Run the tests to make sure it works
  4. Pip uninstall scrapy-selenium
  5. Pip install git+{https://your_repository}

And you have your own scrapy-selenium fork that you may adjust further as you wish while preserving the original scrapy-selenium API

@naveedsid
Copy link

i have also facing same issue, i am finding the solution for 5 days and there is not video on internet about this error, can you please make a short video on it, as you say above, thanks in advance, i'll wait for it. this will be a huge favor from your side.

@naveedsid
Copy link

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

should we use selenium 3 for fork package?

@jg3wilso
Copy link

jg3wilso commented Jul 3, 2023

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    - SELENIUM_DRIVER_NAME = 'chrome'
    - SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    - SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

@naveedsid
Copy link

naveedsid commented Jul 4, 2023

thanks alot, and appreciated, just want to know little bit more about it, i want "undetected-chromedriver" to do the same thing, that selenium webdriver perform in middlewares.py, how will changes made?
Note: undetected-chromedriver (UC) is python library or modified version of selenium, and UC can work with pre-installed chrome.exe, hence it don't want chromedriver for execution it can either work with preinstalled chrome profiles

@ton77v
Copy link

ton77v commented Jul 5, 2023

It seems the best way is to fork the package and change the SeleniumMiddleware.init the way you used to work with Selenium. Actually it's just a few lines of code and you won't end up with ancient Selenium 3

should we use selenium 3 for fork package?

It's possible but makes no sense. Will work just fine with the latest version

@jjerxawp
Copy link

jjerxawp commented Aug 9, 2023

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file

Traceback (most recent call last):
  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse
    img = response.meta['screenshot']
KeyError: 'screenshot'

@ton77v
Copy link

ton77v commented Aug 9, 2023

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file


Traceback (most recent call last):

  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks

    current.result = callback(  # type: ignore[misc]

  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse

    img = response.meta['screenshot']

KeyError: 'screenshot'

That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀

@malmike
Copy link

malmike commented Oct 15, 2023

Thank you for your work @jg3wilso, your solution work like a charm, however it seems to work when using 'chrome' as the browser and it's kinda slow, you know. Then I try to use it with 'firefox' or 'safari' (by adjusting the setting.py), the script won't work as it used to when using 'chrome' in the setting file


Traceback (most recent call last):

  File "/Users/huynhdailong/opt/anaconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks

    current.result = callback(  # type: ignore[misc]

  File "/Users/huynhdailong/Library/CloudStorage/OneDrive-Personal/Desktop/DE/DEP302 - Foundation/Spider/projects/silkdeals/silkdeals/spiders/deals.py", line 20, in parse

    img = response.meta['screenshot']

KeyError: 'screenshot'

That's because the solution was just for Chrome. It wouldn't work for any other browser. It's likely not very hard to make an universal one so let's hope someone will add it here 😀

In order to use all the browsers, I'd recommend creating a service object and passing that into the webdriver. An example of that implementation is https://github.com/clemfromspace/scrapy-selenium/pull/135/files.

NB: The service object takes up additional arguments like log_path and port that I did not consider in this alteration.

@leovizeu
Copy link

leovizeu commented Nov 8, 2023

Credits @ton77v for the answer, I can help simplify his answer:

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well
  • for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

If this worked for you, be sure to like this message and show @ton77v some love!

@ahmedraxa23 please close the issue if this worked for you

I can't get it working, i can't find the selenium folder you mentioned

@J-Brk
Copy link

J-Brk commented Apr 12, 2024

@jg3wilso

@ahmedraxa23 please close the issue if this worked for you

I don't this issue should be closed; since it requires a workaround not a 'proper' solution that works out of the box.

@jogobeny
Copy link

Hi. I've made a naive fix. #133 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests