Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base clases provide more protected methods for subclasses #432

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

dgoiko
Copy link

@dgoiko dgoiko commented Jan 25, 2020

I've prepared a branch with some modifications I had to include in the codebase in order to provide custom WebURLs with adittional fields (and the necesary diferent WebURLTupleBindings).

In order to achieve that, I had to introduce the following modifications:

  • WorkQueues creates a WebURLTupleBinding. I added a protected constructor which accepts it as a parameter, so superclases can provide custom WebURLTupleBinding instances.
  • Frontier creates the WorkQueues in the constructor. Now it has a createWorkQueues method that can be overriden by subclases in order to create custom subclases.
  • Same than above for CrawlController and methods createFrontier createEmptyWebURL
  • WebCrawler class logic to follow redirections and outgoing URLs is now placed inside protected functions that can be overriden.

WorkQueues now has a protected constructor that accepts WebURLTupleBinding as a parameter.

It helps to use custom WebURLs with aditional parameters that require a custom WebURLTupleBinding
Now the constructor calls createWorkQueues to get a new WorkQueues instance. This allows subclasses to override this behaviour and create custom work queues.
Now CrawlController subclasses can create custom Frontiers.
Function createEmptyWebURL created to allow subclasses of CrawlController create their custom WebURLs in addSeed operations.
scheduleOutgoingUrls and performRedirect created in order to be able to modify behaviour in subclasses,

schedule and scheduleAll wrap frontier functions so subclases can schedule manually.
redirectionPhase separated into a protected method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant