Add support for crawling subdomains #27

alexspeller · 2011-08-03T11:57:04Z

Merge changes to support subdomain crawling from runa@91559bd

MaGonglei · 2011-11-07T11:12:13Z

This feature is very useful.
I think anemone should also support for printing out the external links, just print out it but not scan it in deep.
The link checker tool XENU (http://home.snafu.de/tilman/xenulink.html) has this feature.

wokkaflokka · 2011-11-08T16:24:26Z

MaGonglei: It is very simple to gather external links using Anemone, and comparably simple to actually check these links to verify they are valid, etc. The 'on_every_page' block is very helpful in this regard.

If you'd like some code that does exactly what you are asking, I could send an example your way.

MaGonglei · 2011-11-14T00:58:56Z

Hi,wokkaflokka,thanks for your reply.
I think I know what you mean,but I prefer to have this feature when I initialize the anemone crawl like :
Anemone.crawl("http://www.example.com",:external_links => false) do |anemone|
....
end

Because if I use the "on_every_page" block to search the external links (e.g. "page.doc.xpath '//a[@href]') ,it seemed cost too much CPU and Memorys.

If I'm wrong,give me the example.

Thanks.

Merge changes to support subdomain crawling from runa@91559bd

4419464

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for crawling subdomains #27

Add support for crawling subdomains #27

alexspeller commented Aug 3, 2011

MaGonglei commented Nov 7, 2011

wokkaflokka commented Nov 8, 2011

MaGonglei commented Nov 14, 2011

Add support for crawling subdomains #27

Are you sure you want to change the base?

Add support for crawling subdomains #27

Conversation

alexspeller commented Aug 3, 2011

MaGonglei commented Nov 7, 2011

wokkaflokka commented Nov 8, 2011

MaGonglei commented Nov 14, 2011