Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Error on Robots.txt is not handled in OnError #741

Open
sundarv85 opened this issue Nov 29, 2022 · 1 comment
Open

TLS Error on Robots.txt is not handled in OnError #741

sundarv85 opened this issue Nov 29, 2022 · 1 comment

Comments

@sundarv85
Copy link

I'm running a test project on localhost:8000 and when I access it over https, it fails (which is expected)

Get "https://localhost:8000/": tls: first record does not look like a TLS handshake

The above is correctly caught in OnError. However, when I set ignoreRobots to false, then it tries to fetch the robots.txt and the below failure

Get "https://localhost:8000/robots.txt": tls: first record does not look like a TLS handshake

Is not propogated to OnError - as it is really not originating from the request that I had started, but colly tries to first fetch the robots which fails.. Could this also be propogated either to OnError or can be caught with a known Error Code from Colly such as

ErrRobotsTxtBlocked = errors.New("URL blocked by robots.txt")
ErrRobotsTxtFetchFailed = errors.New("Unable to fetch robots.txt") // New Error Code
@WGH-
Copy link
Collaborator

WGH- commented Jan 5, 2023

This proposal makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants