Skip to content
This repository has been archived by the owner on Dec 12, 2021. It is now read-only.

Start crawl sends wrong seed to the crawler #53

Open
aecio opened this issue Jul 27, 2017 · 2 comments
Open

Start crawl sends wrong seed to the crawler #53

aecio opened this issue Jul 27, 2017 · 2 comments
Assignees
Labels

Comments

@aecio
Copy link
Member

aecio commented Jul 27, 2017

When DDT sends the URL to DDT it is appending a string ,1 to the end of the seed URL. Maybe that string is the count of URLs shown in the recommendations box.

@yamsgithub
Copy link
Contributor

This does not seem to be the case. The following ACHE crawler message when urls are added reiterates this:

[2017-08-03 15:50:34,238] INFO [qtp597874846-15] (FrontierManager.java:236) - Adding 3 seed URL(s)...
[2017-08-03 15:50:34,320] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/dir/index/discover?sid=396545327
[2017-08-03 15:50:34,320] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/dir/index/discover?sid=396545433
[2017-08-03 15:50:34,321] INFO [qtp597874846-15] (FrontierManager.java:248) - Added seed URL: http://answers.yahoo.com/

@aecio
Copy link
Member Author

aecio commented Sep 21, 2017

This issue is still happening, tough it not always appending ,1. Right now I'm seeing that it appended 1 in the URLs shown in "Crawling View" -> "Deep Crawling" -> "Domains for crawling".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants