Timer tasks not working with auth on #181

Tobeyforce · 2021-04-08T13:59:36Z

When having auth enabled, my timer tasks stop working.
The response visible in result is:

So Scrapyd is trying to send a request to Scrapydweb, but with auth it expects the basic auth, which Scrapyd does not add to the header. Is there any way to fix this?
It's worth mentioning I have deployed Scrapydweb with gunicorn&nginx.

Any advice would be helpful.

my8100 · 2021-04-08T16:22:39Z

Click the history button on the timer tasks page, then post the related log.
Run scrapydweb without gunicorn&nginx and try again.

Tobeyforce · 2021-04-08T16:28:37Z

History log:

[2021-04-08 16:20:05,034] WARNING in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, would retry later: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}
[2021-04-08 16:20:08,039] ERROR in apscheduler: Fail to execute task #1 (upplandsbrohus sthlm 10min - edit) on node 1, no more retries: Traceback (most recent call last):
  File "/var/www/html/scrapydweb/views/operations/execute_task.py", line 89, in schedule_task
    assert js['status_code'] == 200 and js['status'] == 'ok', "Request got %s" % js
AssertionError: Request got {'status_code': 401, 'status': 'error', 'message': "<script>alert('Fail to login: basic auth for ScrapydWeb has been enabled');</script>"}

[2021-04-08 16:20:40,519] WARNING in apscheduler: Shutting down the scheduler for timer tasks gracefully, wait until all currently executing tasks are finished
[2021-04-08 16:20:40,521] WARNING in apscheduler: The main pid is 1267. Kill it manually if you don't want to wait

Unfortunately running Scrapyd with gunicorn&nginx has created all kinds of problems for me, I hope you one day add an official way to deploy scrapydweb so that we don't have to create workarounds :(
Without a prod server I've never had issues, so I know it would work otherwise.

My understanding is that each request goes through a middleware in run.py

    @app.before_request
    def require_login():
        if app.config.get('ENABLE_AUTH', False):
            auth = request.authorization
            USERNAME = str(app.config.get('USERNAME', ''))  # May be 0 from config file
            PASSWORD = str(app.config.get('PASSWORD', ''))
            if not auth or not (auth.username == USERNAME and auth.password == PASSWORD):
                return authenticate()

My only workaround so far is to change this..

my8100 · 2021-04-09T00:58:45Z

Could you debug with the following steps first?

Run scrapydweb without gunicorn&nginx and try again.
Run scrapydweb with gunicorn and try again.
Run scrapydweb with nginx and try again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timer tasks not working with auth on #181

Timer tasks not working with auth on #181

Tobeyforce commented Apr 8, 2021

my8100 commented Apr 8, 2021

Tobeyforce commented Apr 8, 2021 •

edited

my8100 commented Apr 9, 2021

Timer tasks not working with auth on #181

Timer tasks not working with auth on #181

Comments

Tobeyforce commented Apr 8, 2021

my8100 commented Apr 8, 2021

Tobeyforce commented Apr 8, 2021 • edited

my8100 commented Apr 9, 2021

Tobeyforce commented Apr 8, 2021 •

edited