Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer tasks not working with auth on #181

Open
Tobeyforce opened this issue Apr 8, 2021 · 3 comments
Open

Timer tasks not working with auth on #181

Tobeyforce opened this issue Apr 8, 2021 · 3 comments

Comments

@Tobeyforce
Copy link

When having auth enabled, my timer tasks stop working.
The response visible in result is:
image

So Scrapyd is trying to send a request to Scrapydweb, but with auth it expects the basic auth, which Scrapyd does not add to the header. Is there any way to fix this?
It's worth mentioning I have deployed Scrapydweb with gunicorn&nginx.

Any advice would be helpful.

@my8100
Copy link
Owner

my8100 commented Apr 8, 2021

  1. Click the history button on the timer tasks page, then post the related log.
  2. Run scrapydweb without gunicorn&nginx and try again.

@Tobeyforce
Copy link
Author

Tobeyforce commented Apr 8, 2021

History log:

[2021-04-08 16:20:05,034] WARNING in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, would retry later: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}
[2021-04-08 16:20:08,039] ERROR in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, no more retries: Traceback (most recent call last):
  File "/var/www/html/scrapydweb/views/operations/execute_task.py", line 89, in schedule_task
    assert js['status_code'] == 200 and js['status'] == 'ok', "Request got %s" % js
AssertionError: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}

[2021-04-08 16:20:40,519] WARNING in apscheduler: Shutting down the scheduler for timer tasks gracefully, wait until all currently executing tasks are finished
[2021-04-08 16:20:40,521] WARNING in apscheduler: The main pid is 1267. Kill it manually if you don't want to wait

Unfortunately running Scrapyd with gunicorn&nginx has created all kinds of problems for me, I hope you one day add an official way to deploy scrapydweb so that we don't have to create workarounds :(
Without a prod server I've never had issues, so I know it would work otherwise.

My understanding is that each request goes through a middleware in run.py

    @app.before_request
    def require_login():
        if app.config.get('ENABLE_AUTH', False):
            auth = request.authorization
            USERNAME = str(app.config.get('USERNAME', ''))  # May be 0 from config file
            PASSWORD = str(app.config.get('PASSWORD', ''))
            if not auth or not (auth.username == USERNAME and auth.password == PASSWORD):
                return authenticate()

My only workaround so far is to change this..

@my8100
Copy link
Owner

my8100 commented Apr 9, 2021

Could you debug with the following steps first?

  1. Run scrapydweb without gunicorn&nginx and try again.
  2. Run scrapydweb with gunicorn and try again.
  3. Run scrapydweb with nginx and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants