Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the parser doesn't work fine in many sites especially forums #1

Open
SpearRipper opened this issue Jan 10, 2019 · 4 comments
Open

the parser doesn't work fine in many sites especially forums #1

SpearRipper opened this issue Jan 10, 2019 · 4 comments

Comments

@SpearRipper
Copy link

hello @assnctr

I tried your proxy parser, and I can say it's the best Proxy Parser I ever found.

But they're a problem that the parser doesn't scrape proxy from many sites
If any site has the proxies like this the parser don't scrape them
example:
113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市 电信 | 3秒 | 2019-01-05 23:30:59
124.94.196.188 | 9999 | 高匿名 | HTTP | 辽宁省阜新市 联通 | 1秒 | 2019-01-05 22:30:59
110.52.235.76 | 9999 | 高匿名 | HTTP | 湖南省岳阳市 联通 | 0.3秒 | 2019-01-05 21:30:56
39.137.107.98 | 80 | 高匿名 | HTTP | 中国 移动 | 2秒 | 2019-01-05 20:30:58

Also is there a way to make your parser take all the proxies from http://proxydb.net/ without going so heavy or take ages? because I tried all and it won't work too.

I hope you fix this and make it more advanced when it scrapes these types

Overall great work and thank you In #Advance.

@SpearRipper
Copy link
Author

SpearRipper commented Jan 10, 2019

@relloccate
Copy link
Collaborator

Hi, thanks.

Old v 1.3.0 BETA parsed sites such as:
https://hidemyna.me
http://proxydb.net
e.t.c

All these sites required JS, in 1.3.0 was Headless chrome with cloudflare bypassing. May be i add this in next patches.

About:
113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市 电信 | 3秒 | 2019-01-05 23:30:59

Proxies parse with simple regex (ip:port), but i add this in next patch.

@SpearRipper
Copy link
Author

Hi, thanks.

Old v 1.3.0 BETA parsed sites such as:
https://hidemyna.me
http://proxydb.net
e.t.c

All these sites required JS, in 1.3.0 was Headless chrome with cloudflare bypassing. May be i add this in next patches.

About:
113.120.189.184 | 9999 | 高匿名 | HTTP | 山东省济宁市 电信 | 3秒 | 2019-01-05 23:30:59

Proxies parse with simple regex (ip:port), but i add this in next patch.

**tbh this version is great but now u mention old version have the ability to parse from such site like this, it will be great if the current update apply to parse from all sites even the proxies with port without ":"
since most of the popular sites share their proxies as IP and port without :

also, I don't see the download link for the previous versions

can't wait to see the next update, Thank You.**

@SpearRipper
Copy link
Author

SpearRipper commented Jun 2, 2019

hello @assnctr can you update the Proxy Parser to be able to scrape proxies with type of ip port like the sites i sent you above

http://nntime.com/
https://hidemyna.me/
https://www.my-proxy.com/
https://www.proxynova.com/
https://premproxy.com/
http://proxydb.net

i searched for v 1.3.0 BETA and i couldn't fight a download link :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants