Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【讨论】代理类型及是否可用超时判断代码部分逻辑是否有问题? #756

Open
anysoft opened this issue Jul 1, 2023 · 2 comments

Comments

@anysoft
Copy link

anysoft commented Jul 1, 2023

以下个人整理,如有错漏欢迎指正,一起讨论让项目发展更好。

@jhao104

几点知识

  1. http代理和https代理的区别:
    http/https/socks4/socks5/socks5h/vmess/vless等都是代理服务器提供服务时候与客户端通讯的协议,即cleint <--->proxy server 采用哪种协议。

  2. python的request库中proxies对象其中key包括http/https/socks5等

proxies = {
    "http": "https://192.168.1.1:8888",
    "https": "https://127.0.0.1.1:8888",
}
# http 表示针对访问页面为http协议的页面比如:`http://httpbin.org` 使用  `https://192.168.1.1:8888` 这个代理去访问
# https 表示针对访问https协议的页面比如:`https://baidu.com` 使用 `https://127.0.0.1.1:8888`这个代理去访问
# 并不是指 http协议的代理服务器只能访问http的页面,https协议的代理服务器只能访问https的页面

代码部分

  1. helper/validator.py
  • status_code 判断并不合理,有些代理ip已经废弃,变成了网站,这个时候http/https 协议是会返回的,且基本是200,所以要么使用有特征值的页面内容做判断,要么head请求判断header内容。同时要考虑被作为检测页面的服务稳定性、可达性(cloudflare/bing/microsoft/google/baidu/qq/aliyun)
  • 针对 httpTimeOutValidator 检测,是判断代理是否http协议,那么proxyies 写法应该是 {"http": "http://{proxy}".format(proxy=proxy), "https": "http://{proxy}".format(proxy=proxy)}
  • 同上httpsTimeOutValidator proxyies 写法应该是 {"http": "https://{proxy}".format(proxy=proxy), "https": "https://{proxy}".format(proxy=proxy)}
@ProxyValidator.addHttpValidator
def httpTimeOutValidator(proxy):
    """ http检测超时 """

    proxies = {"http": "http://{proxy}".format(proxy=proxy), "https": "https://{proxy}".format(proxy=proxy)}

    try:
        r = head(conf.httpUrl, headers=HEADER, proxies=proxies, timeout=conf.verifyTimeout)
        return True if r.status_code == 200 else False
    except Exception as e:
        return False


@ProxyValidator.addHttpsValidator
def httpsTimeOutValidator(proxy):
    """https检测超时"""

    proxies = {"http": "http://{proxy}".format(proxy=proxy), "https": "https://{proxy}".format(proxy=proxy)}
    try:
        r = head(conf.httpsUrl, headers=HEADER, proxies=proxies, timeout=conf.verifyTimeout, verify=False)
        return True if r.status_code == 200 else False
    except Exception as e:
        return False
  1. check.py
    基于1 ,代理类型是http和https应该是分开处理的,一般来讲代理要么是http类型,要么是https类型,不排除有http/https双协议支持的,不过应该几乎不存在。
http_r = cls.httpValidator(proxy)
https_r = False if not http_r else cls.httpsValidator(proxy)

我的改法如下:具体可以看我fork的仓库

HTTP_URL = "https://www.baidu.com"
HTTP_URL_HEADER = {"Server": 'bfe'}   # 这里百度自己的server 是bfe具有很强的特征性

HTTPS_URL = "https://www.baidu.com"
HTTPS_URL_HEADER = {"Server": 'bfe'}

@ProxyValidator.addHttpValidator
def httpTimeOutValidator(proxy):
    """ http检测超时 """

    proxies = {"http": "http://{proxy}".format(proxy=proxy), "https": "http://{proxy}".format(proxy=proxy)}
    try:
        r = head(conf.httpsUrl, headers=HEADER, proxies=proxies, timeout=conf.verifyTimeout, verify=False)
        if r.status_code == 200:
            if conf.httpsUrlHeader and len(conf.httpsUrlHeader) > 0:
                for key in conf.httpsUrlHeader.keys():
                    if not r.headers.get(key) or not r.headers.get(key).startswith(conf.httpsUrlHeader.get(key)):
                        return False
                    return True
    except Exception as e:
        return False

@ProxyValidator.addHttpsValidator
def httpsTimeOutValidator(proxy):
    """https检测超时"""

    proxies = {"http": "https://{proxy}".format(proxy=proxy), "https": "https://{proxy}".format(proxy=proxy)}
    try:
        r = head(conf.httpsUrl, headers=HEADER, proxies=proxies, timeout=conf.verifyTimeout, verify=False)
        if r.status_code == 200:
            if conf.httpsUrlHeader and len(conf.httpsUrlHeader) > 0:
                for key in conf.httpsUrlHeader.keys():
                    if not r.headers.get(key) or not r.headers.get(key).startswith(conf.httpsUrlHeader.get(key)):
                        return False
                    return True
    except Exception as e:
        return False

一些想法

  1. 可以加入socks协议代理的检测。
  2. 我觉得检测代理第一步不是判断是哪种类型,而是先通过socks判断服务是否可用(或者第一次http协议判断时候直接根据异常类型给该proxy标记为服务不可达,就不再进行下一步判断,因为没有意义了),可用再判断代理类型,这样避免轮询代理类型/可用性/可达性判断的多倍耗时。毕竟不管是哪种类型的代理,几乎都要建立链接才会有下一步操作(udp协议除外)
@YonQua
Copy link

YonQua commented Jul 21, 2023

好想法

@jhao104
Copy link
Owner

jhao104 commented Jul 31, 2023

受教,我后面改下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants