Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

按照demo什么也采集不到,完全没反应,什么情况呢? #33

Open
forging2012 opened this issue Dec 31, 2017 · 11 comments
Open

Comments

@forging2012
Copy link

forging2012 commented Dec 31, 2017

不管是在本地还是在服务器上都是没反应?

package main
import (
    "fmt"
    "github.com/shiyanhui/dht"
)

func main() {
    downloader := dht.NewWire(65536)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}
@shiyanhui
Copy link
Owner

服务器是独立IP的吗

@forging2012
Copy link
Author

搞定了,是独立IP,被防火墙档了。
但是效率不高,重复的太多。每小时8000,正常吗?

@shiyanhui
Copy link
Owner

这个速度应该算是正常吧, 重复的太多了.

@forging2012
Copy link
Author

我现在的开的参数,如下:

config := dht.NewCrawlConfig()
dht.NewWire(65536, 32767, 10240)

// NewCrawlConfig returns a config in crawling mode.
func NewCrawlConfig() *Config {
	config := NewStandardConfig()
	config.NodeExpriedAfter = 0
	config.KBucketExpiredAfter = 0
	config.CheckKBucketPeriod = time.Second * 5
	config.KBucketSize = math.MaxInt32
	config.Mode = CrawlMode
	config.RefreshNodeNum = 256

	return config
}

再加开点config.RefreshNodeNum = 256 节点数,增大?
据说,Python的爬虫效率要高。

@forging2012
Copy link
Author

dht.NewCrawlConfig()这个模式不能自定义主节点吗?

像这样的

PrimeNodes: []string{
			"router.bittorrent.com:6881",
			"router.utorrent.com:6881",
			"dht.transmissionbt.com:6881",
		},

@ghost
Copy link

ghost commented Mar 25, 2018

防火墙问题是咋解决的? 刚开始在阿里云上面能正常的接受到数据, 但是后来一直就收不到消息了, 我查了一下, centos7.3 默认没有开启防火墙

@forging2012
Copy link
Author

阿里云除了防火墙,还有默认的【安全组规则】,默认很多端口都是不通的。

@mhearttzw
Copy link

mhearttzw commented Dec 10, 2018

腾讯云防火墙关闭,端口全开放了,但还是完全没反应呢,请问有什么解决方案么

@setwang
Copy link

setwang commented Jan 7, 2019 via email

@xja
Copy link

xja commented Jan 13, 2019

你好,想知道8000/h就是最上面那个代码跑出来的吗?我拿sample里面的spider来跑,平均每天才8000不重复的。该如何改进呢?

@nodeboy
Copy link

nodeboy commented May 22, 2019

如果我只要加入网络,端口映射可以吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@nodeboy @forging2012 @shiyanhui @xja @mhearttzw @setwang and others