Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeoutable regular expressions in RobotstxtServer #429

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dgoiko
Copy link

@dgoiko dgoiko commented Jan 24, 2020

Fixes #425 by creating Matchers that throw RuntimeExceptions on timeout and a TimeoutablePathRule that extends PathRule that uses them.

The default behaviour of the system is not to use them, however, it can be enabled via RobotstxtConfig.

NOTE: The code for the timeoutable Matches is based on this stackoverflow answer and it decreases performance of regexp. The ideal thing should be to include a native efficient and timeoutable regex library, but this is a valid workaround

TimeoutablePathRule adds support to timeout Regexp that run for too long. You can configure them to consider that a timeout means a match, or a not match.

Personally I'd throw RegexpTimeoutException, but it may break some foreign subclasses, so I decided to stick to return false. The static version, matchesRobotsPattern throws RegexpTimeoutException if configured to fail on timeout since it will not break any existing code.
UserAgentDirectives  creates TimeoutablePathRule if configured to do so
Timeout for regexp is now configurable in RobotstxtConfig.

RobotstxtParser passes the RobotstxtConfig arguments to use TimeoutablePathRule if necesary.

Style fix in TimeoutablePathRule
@dgoiko dgoiko changed the title Timeoutable regex Timeoutable regular expressions in RobotstxtServer Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential backtracking in regex blocks Thread
1 participant