Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run as service or with docker #364

Open
Dunkhan opened this issue Apr 12, 2023 · 7 comments
Open

Unable to run as service or with docker #364

Dunkhan opened this issue Apr 12, 2023 · 7 comments

Comments

@Dunkhan
Copy link

Dunkhan commented Apr 12, 2023

When I try to run this as a service I get the following output:

File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1585, in _execute_child
and os.path.dirname(executable)
File "/usr/lib/python3.8/posixpath.py", line 152, in dirname
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
ensuring close
Main process exited, code=exited, status=1/FAILURE
flathunter.service: Failed with result 'exit-code'.

I am also having a different error when trying to run it in docker (not docker-compose).

[2023/04/12 17:43:28|__init__.py             |INFO    ]: setting properties for headless
Traceback (most recent call last):
  File "/usr/src/app/flathunt.py", line 118, in <module>
    main()
  File "/usr/src/app/flathunt.py", line 114, in main
    launch_flat_hunt(config, heartbeat)
  File "/usr/src/app/flathunt.py", line 36, in launch_flat_hunt
    hunter.hunt_flats()
  File "/usr/src/app/flathunter/hunter.py", line 56, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/usr/src/app/flathunter/hunter.py", line 35, in crawl_for_exposes
    return chain(*[try_crawl(searcher, url, max_pages)
  File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
    return chain(*[try_crawl(searcher, url, max_pages)
  File "/usr/src/app/flathunter/hunter.py", line 27, in try_crawl
    return searcher.crawl(url, max_pages)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 150, in crawl
    return self.get_results(url, max_pages)
  File "/usr/src/app/flathunter/crawler/immobilienscout.py", line 90, in get_results
    soup = self.get_page(search_url, self.get_driver(), page_no)
  File "/usr/src/app/flathunter/crawler/immobilienscout.py", line 175, in get_page
    return self.get_soup_from_url(
  File "/usr/local/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 84, in get_soup_from_url
    self.resolve_recaptcha(driver, checkbox, afterlogin_string or "")
  File "/usr/local/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 198, in resolve_recaptcha
    iframe_present = self._wait_for_iframe(driver)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 255, in _wait_for_iframe
    iframe = WebDriverWait(driver, 10).until(EC.visibility_of_element_located(
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 86, in until
    value = method(self._driver)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/expected_conditions.py", line 139, in _predicate
    return _element_if_visible(driver.find_element(*locator))
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 831, in find_element
    return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed

I am running on a virtual server on strato.de running linux (ubuntu)
Any advice is appreciated

@codders
Copy link

codders commented Apr 15, 2023

Hi @Dunkhan ,

The second issue (the chrome page crash) sometimes shows up when the docker container doesn't have enough memory available. How much memory are you assigning to your docker containers?

The first issue is less clear - there's not enough information in the error message you pasted to point to any flathunter code. But it looks like somewhere where the code expects a filename it has received a None object. Do you have the Chrome binary available in the environment where you are running docker as a service?

@Dunkhan
Copy link
Author

Dunkhan commented Apr 17, 2023

Thanks for the response. I am not sure how to increase the memory to the docker container (I am not terribly familiar with docker). My understanding was that the memory isn't limited by default and I have not taken any steps to limit it.

I checked to make sure chrome was installed on the server for the service and it seems it was not installed correctly, now I think it is though, and the error output has changed:

patching driver executable /home/flathunter/.local/share/undetected_chromedriver/undetected_chromedriver
Traceback (most recent call last):
   File "flathunt.py", line 118, in <module>
     main()
   File "flathunt.py", line 114, in main
     launch_flat_hunt(config, heartbeat)
   File "flathunt.py", line 36, in launch_flat_hunt
     hunter.hunt_flats()
   File "/opt/flathunter/flathunter/hunter.py", line 56, in hunt_flats
     for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
   File "/opt/flathunter/flathunter/hunter.py", line 35, in crawl_for_exposes
     return chain(*[try_crawl(searcher, url, max_pages)
   File "/opt/flathunter/flathunter/hunter.py", line 35, in <listcomp>
     return chain(*[try_crawl(searcher, url, max_pages)
   File "/opt/flathunter/flathunter/hunter.py", line 27, in try_crawl
     return searcher.crawl(url, max_pages)
   File "/opt/flathunter/flathunter/abstract_crawler.py", line 150, in crawl
     return self.get_results(url, max_pages)
   File "/opt/flathunter/flathunter/crawler/immobilienscout.py", line 90, in get_results
     soup = self.get_page(search_url, self.get_driver(), page_no)
   File "/opt/flathunter/flathunter/crawler/immobilienscout.py", line 65, in get_driver
     self.driver = get_chrome_driver(driver_arguments)
   File "/opt/flathunter/flathunter/chrome_wrapper.py", line 47, in get_chrome_driver
     driver = uc.Chrome(version_main=chrome_version, options=chrome_options) # pylint: disable=no-member
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 441, in __init__
     super(Chrome, self).__init__(
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
     super().__init__(
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
     super().__init__(
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
     self.start_session(capabilities, browser_profile)
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 704, in start_session
     super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session(
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
     response = self.execute(Command.NEW_SESSION, parameters)
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
     self.error_handler.check_response(response)
   File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
     raise exception_class(message, screen, stacktrace)
 selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:45545

I read in another report a suggestion to check --version on google-chrome, chrome and chromium. In case this is relevant it only returns a version (112.0.5615.121) for google-chrome and nothing for the other two.

@codders
Copy link

codders commented Apr 18, 2023

Hi @Dunkhan ,

Looking here, there does seem to be a system memory limit for Docker Mac:
https://docs.docker.com/desktop/settings/mac/

It might be worth checking if increasing the memory allocation there helps with your issue. The error message could also be because of a mismatch between the version of undetected_chrome and the version of google-chrome installed on your machine.

There are some discussions on the undetected_chrome site about selenium connection issues:

https://github.com/ultrafunkamsterdam/undetected-chromedriver/discussions?discussions_q=is%3Aopen+cannot+connect+

You might find that hard-coding the version (driver = uc.Chrome( version_main = 110 )) helps as a temporary fix.

@Dunkhan
Copy link
Author

Dunkhan commented Apr 21, 2023

I am using ssh to set this up on a linux virtual server. The guides on docs.docker cover how to change the memory limit on a GUI that I don't have access to.
I tried a bunch of the suggestions from the uc discussion but nothing worked. I also tried hardcoding the version. I added some debug output to see what version was being detected and it was correct (112). I guess I should post my own discussion on the uc project maybe.

@kevincali
Copy link

I was able to resolve the page crash issue using following args:

driver_arguments:
  - "--headless"
  - "--disable-dev-shm-usage"

@matiya
Copy link

matiya commented Jan 21, 2024

@kevincali Thanks, the - "--disable-dev-shm-usage" parameter solved this issue for me.

@saladpanda
Copy link

saladpanda commented Jan 27, 2024

Had the same problem. These driver_arguments seem to have solved it for me.

To clarify this for other readers. I set the following in the config.yaml:

captcha:
  driver_arguments:
    - "--headless"
    - "--disable-dev-shm-usage"

I found this confusing, but the arguments are not only used when solving captchas, but for all chrome instances in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants