Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question on crawler #24

Open
thiswillbeyourgithub opened this issue Jun 14, 2023 · 2 comments
Open

question on crawler #24

thiswillbeyourgithub opened this issue Jun 14, 2023 · 2 comments

Comments

@thiswillbeyourgithub
Copy link

Hi,

I read this page from your doc the other day and was wondering.

Why not just article extractors made in the passed? There is even a github tag for some of them there

Just wondering, hope you don't mind

@polyrabbit
Copy link
Owner

Actually, I started this project almost 9 years ago - late 2014 (see my first commit), when there are only few open-sourced extractors, and they didn't perform well at that time.

One reason to write it from scratch is flexibility and customizability - I can tune the parameters so that it suits better for HN posts. One case is the HN comments page, it appears frequently on front-page but most extractors do not get the right content.

I'll try some of the modern ones later, thanks.

@thiswillbeyourgithub
Copy link
Author

Interesting thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants