Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Error 403: request disallowed by robots.txt #702

Open
Constantin07 opened this issue Jul 18, 2019 · 3 comments
Open

HTTP Error 403: request disallowed by robots.txt #702

Constantin07 opened this issue Jul 18, 2019 · 3 comments

Comments

@Constantin07
Copy link

Constantin07 commented Jul 18, 2019

Running in python venv (MacOS) Python 2.7.16

Installed packages in venv:

splinter==0.11.0
zope.testbrowser==5.3.3
lxml==4.3.4
cssselect==1.0.3
mechanize==0.4.2
from splinter import Browser

jenkins_url = 'some value here'

browser = Browser('zope.testbrowser')
browser.visit(jenkins_url)

getting back:

Traceback (most recent call last):
  File "jenkins_test.py", line 9, in <module>
    browser.visit(jenkins_url)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/splinter/driver/zopetestbrowser.py", line 81, in visit
    self._browser.open(url)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/zope/testbrowser/browser.py", line 252, in open
    self._processRequest(url, make_request)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/zope/testbrowser/browser.py", line 276, in _processRequest
    resp = make_request(reqargs)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/zope/testbrowser/browser.py", line 250, in make_request
    return self.testapp.get(url, **args)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/webtest/app.py", line 335, in get
    expect_errors=expect_errors)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/zope/testbrowser/browser.py", line 92, in do_request
    self._assertAllowed(req.url)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/zope/testbrowser/browser.py", line 89, in _assertAllowed
    raise RobotExclusionError(url, 403, msg, [], None)
zope.testbrowser.browser.RobotExclusionError: HTTP Error 403: request disallowed by robots.txt

What am I doing wrong ?

Tried to follow this docs here https://splinter.readthedocs.io/en/latest/drivers/zope.testbrowser.html
but ignore_robots=True doesn't help:

Traceback (most recent call last):
  File "jenkins_test.py", line 8, in <module>
    browser = Browser('zope.testbrowser', ignore_robots=True)
  File "/Users/homedir/Documents/Jenkins_Test/venv/lib/python2.7/site-packages/splinter/browser.py", line 64, in Browser
    return driver(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'ignore_robots'
@mattfeury
Copy link

looks like it was removed here: a7cbbf9

no clue why. @andrewsmedina was the author so may be able to shed some light

@andrewsmedina
Copy link
Member

This was removed on zope.testbrowser upgrade. I believe that we should figure out how to fix it using the current version of this driver.

@mo-han
Copy link

mo-han commented May 27, 2021

Just tried a simple workaround, which overrides _assertAllowed method to by-pass checking agains zope.testbrowser.browser._allowed, here's the code:

import splinter.driver.zopetestbrowser

def pass_assert_allowed(*args):
    return

b = splinter.browser.ZopeTestBrowser()
b._browser.testapp._assertAllowed = pass_assert_allowed

Now can visit any URL without request disallowed error.
But frankly zope.testbrowser cannot handle lots of site well, visit probably still failed for other reason. Personally speaking, firefox or chrome is better choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants